Bug #179
closedcorrupted LogEntry in mon data
0%
Description
this is after a restart due to the update of all ceph daemons to c4e6482d302aa288031ced6cd845d60ba655e5c8
#0 0x00007ffb1880da75 in _GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffb188115c0 in *_GI_abort () at abort.c:92
#2 0x00007ffb190c28e5 in _gnu_cxx::_verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3 0x00007ffb190c0d16 in ?? () from /usr/lib/libstdc++.so.6
#4 0x00007ffb190c0d43 in std::terminate() () from /usr/lib/libstdc++.so.6
#5 0x00007ffb190c0e3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#6 0x00000000004c6b52 in LogEntry::decode(ceph::buffer::list::iterator&) ()
#7 0x00000000004c3dea in LogMonitor::update_from_paxos (this=0x11e84c0) at mon/LogMonitor.cc:127
#8 0x000000000047f2a8 in PaxosService::_commit (this=0x11e84c0) at mon/PaxosService.cc:105
#9 0x000000000047c06c in finish_contexts(std::list<Context, std::allocator<Context*> >&, int) ()
#10 0x000000000047862d in Paxos::handle_accept (this=0x11f2150, accept=<value optimized out>) at mon/Paxos.cc:454
#11 0x000000000047b0fb in Paxos::dispatch (this=0x11f2150, m=0x7ffb10000a10) at mon/Paxos.cc:839
#12 0x0000000000468fe4 in Monitor::_ms_dispatch (this=<value optimized out>, m=0x7ffb10000a10) at mon/Monitor.cc:716
#13 0x000000000047483d in Monitor::ms_dispatch(Message*) ()
#14 0x00000000004522b9 in Messenger::ms_deliver_dispatch (this=<value optimized out>) at msg/Messenger.h:97
#15 SimpleMessenger::dispatch_entry (this=<value optimized out>) at msg/SimpleMessenger.cc:332
#16 0x000000000044592c in SimpleMessenger::DispatchThread::entry (this=0x11e6b60) at msg/SimpleMessenger.h:494
#17 0x0000000000457a1a in Thread::_entry_func (arg=0x73f6) at ./common/Thread.h:39
#18 0x00007ffb196a09ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#19 0x00007ffb188c06cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#20 0x0000000000000000 in ?? ()
Files
Updated by Sage Weil almost 14 years ago
Doh.. so it looks like the piece of info I need was in the logm directory. If you still have it, great. If not, I can probably find it in the core file (if you attach the cmon binary too). Thanks!
Updated by Sage Weil almost 14 years ago
- Status changed from New to In Progress
- Assignee set to Sage Weil
Okay, I can't make heads or tails of your core file on my system for some reason. Can you try this on your machine? In gdb,
f 7 set variable $p=p.p._M_node p bl._len <--- make note of this length value... let's say its 1234. it's a file size, shouldn't be too big. x /1234xb *(void**)(*(void**)((*(void **)$p + 0x10)) + 0x08) ^^^^ use that value here
If that doesn't work, I'm inclined to give up here and keep an eye out for this crash happening again. And next time, we won't delete logm/* :).
Thanks!
Updated by ar Fred almost 14 years ago
- File gdb_out.log gdb_out.log added
I also had some problems using gdb...
gdb won't work if cmon is not in /usr/bin and debug symbols for cmon (which I forgot to send you) are not in /usr/lib/debug/usr/bin !
I just uploaded core + exec + debug symbols to cephdrop (179_debug_syms.tgz) in case you need it. You can try moving cmon and cmon_dbg to the correct location (renaming cmon_dbg to cmon) as described above before starting gdb, it worked for me.
(gdb) frame 7 #7 0x00000000004c3dea in LogMonitor::update_from_paxos (this=0x11e84c0) at mon/LogMonitor.cc:127 127 mon/LogMonitor.cc: No such file or directory. in mon/LogMonitor.cc (gdb) set variable $p=p.p._M_node (gdb) p bl._len $1 = 5347 (gdb) x /5347xb *(void**)(*(void**)((*(void **)$p + 0x10)) + 0x08) [see attached file]
Updated by Sage Weil almost 14 years ago
- File badlog badlog added
- Subject changed from monitor crash at startup to corrupted LogEntry in mon data
- Status changed from In Progress to Rejected
Hmm, yeah i give up on this one. I see that it's corrupt, but not in any particularly suggestive way. No idea what might have caused that. Not even block aligned.
For posterity, the binary blob is attached.
Let's close this out, and please let me know if you see anything like it again. Next time, the thing that would be helpful would be mon* logs, and logm/ dir contents.
Updated by Sage Weil almost 14 years ago
- Status changed from Rejected to In Progress
Updated by Sage Weil almost 14 years ago
- Status changed from In Progress to Closed