Project

General

Profile

Actions

Bug #179

closed

corrupted LogEntry in mon data

Added by ar Fred almost 14 years ago. Updated over 13 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

this is after a restart due to the update of all ceph daemons to c4e6482d302aa288031ced6cd845d60ba655e5c8

#0 0x00007ffb1880da75 in _GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffb188115c0 in *
_GI_abort () at abort.c:92
#2 0x00007ffb190c28e5 in _gnu_cxx::_verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3 0x00007ffb190c0d16 in ?? () from /usr/lib/libstdc++.so.6
#4 0x00007ffb190c0d43 in std::terminate() () from /usr/lib/libstdc++.so.6
#5 0x00007ffb190c0e3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#6 0x00000000004c6b52 in LogEntry::decode(ceph::buffer::list::iterator&) ()
#7 0x00000000004c3dea in LogMonitor::update_from_paxos (this=0x11e84c0) at mon/LogMonitor.cc:127
#8 0x000000000047f2a8 in PaxosService::_commit (this=0x11e84c0) at mon/PaxosService.cc:105
#9 0x000000000047c06c in finish_contexts(std::list<Context
, std::allocator<Context*> >&, int) ()
#10 0x000000000047862d in Paxos::handle_accept (this=0x11f2150, accept=<value optimized out>) at mon/Paxos.cc:454
#11 0x000000000047b0fb in Paxos::dispatch (this=0x11f2150, m=0x7ffb10000a10) at mon/Paxos.cc:839
#12 0x0000000000468fe4 in Monitor::_ms_dispatch (this=<value optimized out>, m=0x7ffb10000a10) at mon/Monitor.cc:716
#13 0x000000000047483d in Monitor::ms_dispatch(Message*) ()
#14 0x00000000004522b9 in Messenger::ms_deliver_dispatch (this=<value optimized out>) at msg/Messenger.h:97
#15 SimpleMessenger::dispatch_entry (this=<value optimized out>) at msg/SimpleMessenger.cc:332
#16 0x000000000044592c in SimpleMessenger::DispatchThread::entry (this=0x11e6b60) at msg/SimpleMessenger.h:494
#17 0x0000000000457a1a in Thread::_entry_func (arg=0x73f6) at ./common/Thread.h:39
#18 0x00007ffb196a09ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#19 0x00007ffb188c06cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#20 0x0000000000000000 in ?? ()


Files

mon0_data.tgz (279 KB) mon0_data.tgz ar Fred, 06/04/2010 11:23 AM
gdb_out.log (50.3 KB) gdb_out.log ar Fred, 06/08/2010 12:49 AM
badlog (5.22 KB) badlog currupted encoded logentries Sage Weil, 06/08/2010 08:31 PM
Actions #1

Updated by ar Fred almost 14 years ago

Actions #2

Updated by Sage Weil almost 14 years ago

Doh.. so it looks like the piece of info I need was in the logm directory. If you still have it, great. If not, I can probably find it in the core file (if you attach the cmon binary too). Thanks!

Actions #3

Updated by Sage Weil almost 14 years ago

  • Status changed from New to In Progress
  • Assignee set to Sage Weil

Okay, I can't make heads or tails of your core file on my system for some reason. Can you try this on your machine? In gdb,

f 7
set variable $p=p.p._M_node
p bl._len   <--- make note of this length value... let's say its 1234.  it's a file size, shouldn't be too big.
x /1234xb *(void**)(*(void**)((*(void **)$p + 0x10)) + 0x08)
   ^^^^ use that value here

If that doesn't work, I'm inclined to give up here and keep an eye out for this crash happening again. And next time, we won't delete logm/* :).

Thanks!

Actions #4

Updated by ar Fred almost 14 years ago

I also had some problems using gdb...
gdb won't work if cmon is not in /usr/bin and debug symbols for cmon (which I forgot to send you) are not in /usr/lib/debug/usr/bin !

I just uploaded core + exec + debug symbols to cephdrop (179_debug_syms.tgz) in case you need it. You can try moving cmon and cmon_dbg to the correct location (renaming cmon_dbg to cmon) as described above before starting gdb, it worked for me.

(gdb) frame 7
#7  0x00000000004c3dea in LogMonitor::update_from_paxos (this=0x11e84c0) at mon/LogMonitor.cc:127
127     mon/LogMonitor.cc: No such file or directory.
        in mon/LogMonitor.cc
(gdb) set variable $p=p.p._M_node
(gdb) p bl._len
$1 = 5347
(gdb) x /5347xb *(void**)(*(void**)((*(void **)$p + 0x10)) + 0x08)
[see attached file]
Actions #5

Updated by Sage Weil almost 14 years ago

  • File badlog badlog added
  • Subject changed from monitor crash at startup to corrupted LogEntry in mon data
  • Status changed from In Progress to Rejected

Hmm, yeah i give up on this one. I see that it's corrupt, but not in any particularly suggestive way. No idea what might have caused that. Not even block aligned.

For posterity, the binary blob is attached.

Let's close this out, and please let me know if you see anything like it again. Next time, the thing that would be helpful would be mon* logs, and logm/ dir contents.

Actions #6

Updated by Sage Weil almost 14 years ago

  • Status changed from Rejected to In Progress
Actions #7

Updated by Sage Weil almost 14 years ago

  • Status changed from In Progress to Closed
Actions

Also available in: Atom PDF