Bug #3497
closed
mon: leader segfaults after restarting osds
Added by Joao Eduardo Luis over 11 years ago.
Updated over 11 years ago.
Description
-135> 2012-11-15 08:01:09.179382 a31a700 -1 *** Caught signal (Segmentation fault) **
in thread a31a700
ceph version 0.54-589-gd9bfbc1 (d9bfbc11160bd7b1d659b62238dbd0e4fd0204be)
1: ./ceph-mon() [0x53d10a]
2: (()+0xfcb0) [0x4e41cb0]
3: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x1d3) [0x5db373]
4: (Monitor::send_reply(PaxosServiceMessage*, Message*)+0x475) [0x477625]
5: (OSDMonitor::send_incremental(PaxosServiceMessage*, unsigned int)+0xc6) [0x4b1ad6]
6: (OSDMonitor::send_latest(PaxosServiceMessage*, unsigned int)+0x79) [0x4bc729]
7: (OSDMonitor::_booted(MOSDBoot*, bool)+0xd6) [0x4be076]
8: (Context::complete(int)+0xa) [0x48ee8a]
9: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x11d) [0x490cbd]
10: (Paxos::handle_accept(MMonPaxos*)+0x83a) [0x4a4c2a]
11: (Paxos::dispatch(PaxosServiceMessage*)+0x24b) [0x4a7d0b]
12: (Monitor::_ms_dispatch(Message*)+0xfb0) [0x48df90]
13: (Monitor::ms_dispatch(Message*)+0x32) [0x49dac2]
14: (DispatchQueue::entry()+0x349) [0x642019]
15: (DispatchQueue::DispatchThread::entry()+0xd) [0x5dc7ed]
16: (()+0x7e9a) [0x4e39e9a]
17: (clone()+0x6d) [0x64494bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Unfortunately, most of the log (everything that didn't fit the terminal's buffer) is unavailable.
Files
- Description updated (diff)
Might have jumped the gun on this description. Assumed too much from what I did when I wrote the description. The segfault appears to be related with restarting the osds; it just happened that I killed the slurping monitor by that time, but from the error message it had nothing to do with that.
- Subject changed from mon: leader segfaults when slurping peon is interrupted to mon: leader segfaults after restarting osds
Different paxos machine, crashes on the same place after finishing the contexts. Only happens on wip-mon-leaks-fix afaict, after testing with next.
ceph version 0.54-605-g6fce68a (6fce68ae1e5794f0a35813088e8a41729188a9d6)
1: ./ceph-mon() [0x53d10a]
2: (()+0xfcb0) [0x4e41cb0]
3: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x1d3) [0x5db373]
4: (Monitor::send_reply(PaxosServiceMessage*, Message*)+0x475) [0x4773c5]
5: (MDSMonitor::preprocess_beacon(MMDSBeacon*)+0x9ff) [0x4dc2ff]
6: (MDSMonitor::preprocess_query(PaxosServiceMessage*)+0x271) [0x4debf1]
7: (PaxosService::dispatch(PaxosServiceMessage*)+0x155) [0x4a9f85]
8: (Context::complete(int)+0xa) [0x48ecda]
9: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x11d) [0x490b0d]
10: (Paxos::handle_accept(MMonPaxos*)+0x864) [0x4a49e4]
11: (Paxos::dispatch(PaxosServiceMessage*)+0x24b) [0x4a7a9b]
12: (Monitor::_ms_dispatch(Message*)+0x1030) [0x48dda0]
13: (Monitor::ms_dispatch(Message*)+0x32) [0x49d852]
14: (DispatchQueue::entry()+0x349) [0x642029]
15: (DispatchQueue::DispatchThread::entry()+0xd) [0x5dc7ed]
16: (()+0x7e9a) [0x4e39e9a]
17: (clone()+0x6d) [0x64494bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
- Status changed from New to In Progress
After some testing, git bisect reports 19831b979a13f699b0e87125dfcfad3ea607d713 as the first bad commit.
Attempting a fix.
- Status changed from In Progress to Resolved
Removing said commit fixes the crash.
The patch was putting the Connection back as part of the session cleanup, so this will leave room for a connection lingering in memory and potentially the session as well, thus affecting completion of #3476.
Marking this as Resolved.
Also available in: Atom
PDF