Actions
Bug #517
closedmonitors crashing on startup after injecting corrupt crush map
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I followed the instructions at http://ceph.newdream.net/wiki/OSD_cluster_expansion/contraction to add a 3rd osd node to my existing 2 node cluster, but forgot to recode the crushmap before injecting it (so I accidentally injected the decoded map).
As I injected it (from the new osd node), the monitors on the two other nodes crashed. And now neither of them will start up either, outputting the same stack trace.
root@srv-ohkpf:/root# ceph osd setcrushmap -i /tmp/crush.txt read 850 bytes from /tmp/crush.txt 2010-10-24 13:09:17.167566 mon <- [osd,setcrushmap] 2010-10-24 13:09:18.189259 7f0aea70b710 monclient: hunting for new mon 2010-10-24 13:09:18.190237 7f0ae9608710 -- 10.61.136.222:0/8447 >> 10.135.211.78:6789/0 pipe(0xa698a0 sd=-1 pgs=0 cs=0 l=0).fault first fault 2010-10-24 13:09:20.166031 7f0ae9507710 -- 10.61.136.222:0/8447 >> 10.106.124.118:6789/0 pipe(0xa67420 sd=-1 pgs=0 cs=0 l=0).fault first fault 2010-10-24 13:09:23.166973 7f0ae9406710 -- 10.61.136.222:0/8447 >> 10.135.211.78:6789/0 pipe(0xa67b60 sd=-1 pgs=0 cs=0 l=0).fault first fault
and from the mon log file:
2010-10-24 13:09:17.167146 7f9482418710 mon.0@0(leader) e1 handle_command mon_command(osd setcrushmap v 0) v1 ./crush/CrushWrapper.h: In function 'void CrushWrapper::decode(ceph::buffer::list::iterator&)': ./crush/CrushWrapper.h:437: FAILED assert(magic == 0x00010000ul) ceph version 0.22.1 (commit:7464f9688001aa89f9673ba14e6d075d0ee33541) 1: (OSDMap::apply_incremental(OSDMap::Incremental&)+0x12d8) [0x4a9c58] 2: (OSDMonitor::update_from_paxos()+0xf1) [0x4945d1] 3: (PaxosService::_commit()+0x25) [0x48a535] 4: (finish_contexts(std::list<Context*, std::allocator<Context*> >&, int)+0x1b1) [0x486d01] 5: (Paxos::handle_accept(MMonPaxos*)+0x39e) [0x482fce] 6: (Paxos::dispatch(PaxosServiceMessage*)+0x1b3) [0x485c13] 7: (Monitor::_ms_dispatch(Message*)+0x8e0) [0x472760] 8: (Monitor::ms_dispatch(Message*)+0x67) [0x47f2e7] 9: (SimpleMessenger::dispatch_entry()+0x79b) [0x45ac8b] 10: (SimpleMessenger::DispatchThread::entry()+0x1f) [0x44c35f] 11: (Thread::_entry_func(void*)+0xa) [0x46165a] 12: (()+0x69ca) [0x7f94846fd9ca] 13: (clone()+0x6d) [0x7f948391c70d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. *** Caught signal (ABRT) *** ceph version 0.22.1 (commit:7464f9688001aa89f9673ba14e6d075d0ee33541) 1: (sigabrt_handler(int)+0xde) [0x557d4e] 2: (()+0x33af0) [0x7f9483869af0] 3: (gsignal()+0x35) [0x7f9483869a75] 4: (abort()+0x180) [0x7f948386d5c0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f948411f8e5] 6: (()+0xcad16) [0x7f948411dd16] 7: (()+0xcad43) [0x7f948411dd43] 8: (()+0xcae3e) [0x7f948411de3e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x69c) [0x542b3c] 10: (CrushWrapper::decode(ceph::buffer::list::iterator&)+0x744) [0x4a6304] 11: (OSDMap::apply_incremental(OSDMap::Incremental&)+0x12d8) [0x4a9c58] 12: (OSDMonitor::update_from_paxos()+0xf1) [0x4945d1] 13: (PaxosService::_commit()+0x25) [0x48a535] 14: (finish_contexts(std::list<Context*, std::allocator<Context*> >&, int)+0x1b1) [0x486d01] 15: (Paxos::handle_accept(MMonPaxos*)+0x39e) [0x482fce] 16: (Paxos::dispatch(PaxosServiceMessage*)+0x1b3) [0x485c13] 17: (Monitor::_ms_dispatch(Message*)+0x8e0) [0x472760] 18: (Monitor::ms_dispatch(Message*)+0x67) [0x47f2e7] 19: (SimpleMessenger::dispatch_entry()+0x79b) [0x45ac8b] 20: (SimpleMessenger::DispatchThread::entry()+0x1f) [0x44c35f] 21: (Thread::_entry_func(void*)+0xa) [0x46165a]
Updated by Sage Weil over 13 years ago
- Category set to Monitor
- Assignee set to Colin McCabe
- Target version set to v0.23
Need to decode the provided map in a try {} block to verify it is valid before using it. In OSDMonitor::prepare_command() I think.
Updated by Colin McCabe over 13 years ago
- Status changed from New to Resolved
Actions