Feature #5657
monitor: deal with bad crush maps more gracefully
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Component(RADOS):
Monitor
Pull request ID:
Description
ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404) 1: /usr/bin/ceph-mon() [0x59241a] 2: (()+0xfcb0) [0x7f3922bbfcb0] 3: /usr/bin/ceph-mon() [0x66bdd9] 4: (crush_do_rule()+0x1e5) [0x66c5c5] 5: (CrushWrapper::do_rule(int, int, std::vector<int, std::allocator<int> >&, int, std::vector<unsigned int, std::allocator<unsigned int> > const&) const+0x7a) [0x5a935a] 6: (CrushTester::test()+0xc60) [0x5a4640] 7: (OSDMonitor::prepare_command(MMonCommand*)+0xae5) [0x507225] 8: (OSDMonitor::prepare_update(PaxosServiceMessage*)+0x1fb) [0x51008b] 9: (PaxosService::dispatch(PaxosServiceMessage*)+0x969) [0x4e8439] 10: (Monitor::handle_command(MMonCommand*)+0x40a) [0x4aa02a] 11: (Monitor::_ms_dispatch(Message*)+0xc23) [0x4b79a3] 12: (Monitor::handle_forward(MForward*)+0x749) [0x4b87a9] 13: (Monitor::_ms_dispatch(Message*)+0xed3) [0x4b7c53] 14: (Monitor::ms_dispatch(Message*)+0x32) [0x4d1bf2] 15: (DispatchQueue::entry()+0x3f1) [0x6abc71] 16: (DispatchQueue::DispatchThread::entry()+0xd) [0x639a8d] 17: (()+0x7e9a) [0x7f3922bb7e9a] 18: (clone()+0x6d) [0x7f3921852ccd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
This is a lot better than propagating a bad map to all the OSDs, but it would be even cooler if it caught these sorts of errors and returned a "no" to the user so that it doesn't go around and crash all the monitors repeatedly.
History
#1 Updated by Joao Eduardo Luis over 6 years ago
- Project changed from Ceph to RADOS
- Category changed from Monitor to Correctness/Safety
- Status changed from New to Resolved
- Component(RADOS) Monitor added
Resolved at some point by using external crushtool to validate crushmaps.