Bug #3497
closedmon: leader segfaults after restarting osds
0%
Description
-135> 2012-11-15 08:01:09.179382 a31a700 -1 *** Caught signal (Segmentation fault) ** in thread a31a700 ceph version 0.54-589-gd9bfbc1 (d9bfbc11160bd7b1d659b62238dbd0e4fd0204be) 1: ./ceph-mon() [0x53d10a] 2: (()+0xfcb0) [0x4e41cb0] 3: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x1d3) [0x5db373] 4: (Monitor::send_reply(PaxosServiceMessage*, Message*)+0x475) [0x477625] 5: (OSDMonitor::send_incremental(PaxosServiceMessage*, unsigned int)+0xc6) [0x4b1ad6] 6: (OSDMonitor::send_latest(PaxosServiceMessage*, unsigned int)+0x79) [0x4bc729] 7: (OSDMonitor::_booted(MOSDBoot*, bool)+0xd6) [0x4be076] 8: (Context::complete(int)+0xa) [0x48ee8a] 9: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x11d) [0x490cbd] 10: (Paxos::handle_accept(MMonPaxos*)+0x83a) [0x4a4c2a] 11: (Paxos::dispatch(PaxosServiceMessage*)+0x24b) [0x4a7d0b] 12: (Monitor::_ms_dispatch(Message*)+0xfb0) [0x48df90] 13: (Monitor::ms_dispatch(Message*)+0x32) [0x49dac2] 14: (DispatchQueue::entry()+0x349) [0x642019] 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x5dc7ed] 16: (()+0x7e9a) [0x4e39e9a] 17: (clone()+0x6d) [0x64494bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Unfortunately, most of the log (everything that didn't fit the terminal's buffer) is unavailable.
Files
Updated by Joao Eduardo Luis over 11 years ago
Might have jumped the gun on this description. Assumed too much from what I did when I wrote the description. The segfault appears to be related with restarting the osds; it just happened that I killed the slurping monitor by that time, but from the error message it had nothing to do with that.
Updated by Joao Eduardo Luis over 11 years ago
- Subject changed from mon: leader segfaults when slurping peon is interrupted to mon: leader segfaults after restarting osds
Updated by Joao Eduardo Luis over 11 years ago
Different paxos machine, crashes on the same place after finishing the contexts. Only happens on wip-mon-leaks-fix afaict, after testing with next.
ceph version 0.54-605-g6fce68a (6fce68ae1e5794f0a35813088e8a41729188a9d6) 1: ./ceph-mon() [0x53d10a] 2: (()+0xfcb0) [0x4e41cb0] 3: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x1d3) [0x5db373] 4: (Monitor::send_reply(PaxosServiceMessage*, Message*)+0x475) [0x4773c5] 5: (MDSMonitor::preprocess_beacon(MMDSBeacon*)+0x9ff) [0x4dc2ff] 6: (MDSMonitor::preprocess_query(PaxosServiceMessage*)+0x271) [0x4debf1] 7: (PaxosService::dispatch(PaxosServiceMessage*)+0x155) [0x4a9f85] 8: (Context::complete(int)+0xa) [0x48ecda] 9: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x11d) [0x490b0d] 10: (Paxos::handle_accept(MMonPaxos*)+0x864) [0x4a49e4] 11: (Paxos::dispatch(PaxosServiceMessage*)+0x24b) [0x4a7a9b] 12: (Monitor::_ms_dispatch(Message*)+0x1030) [0x48dda0] 13: (Monitor::ms_dispatch(Message*)+0x32) [0x49d852] 14: (DispatchQueue::entry()+0x349) [0x642029] 15: (DispatchQueue::DispatchThread::entry()+0xd) [0x5dc7ed] 16: (()+0x7e9a) [0x4e39e9a] 17: (clone()+0x6d) [0x64494bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Joao Eduardo Luis over 11 years ago
- Status changed from New to In Progress
After some testing, git bisect reports 19831b979a13f699b0e87125dfcfad3ea607d713 as the first bad commit.
Attempting a fix.
Updated by Joao Eduardo Luis over 11 years ago
- Status changed from In Progress to Resolved
Removing said commit fixes the crash.
The patch was putting the Connection back as part of the session cleanup, so this will leave room for a connection lingering in memory and potentially the session as well, thus affecting completion of #3476.
Marking this as Resolved.