Project

General

Profile

Feature #5657

monitor: deal with bad crush maps more gracefully

Added by Greg Farnum about 6 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
Start date:
07/17/2013
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Component(RADOS):
Monitor
Pull request ID:

Description

 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
 1: /usr/bin/ceph-mon() [0x59241a]
 2: (()+0xfcb0) [0x7f3922bbfcb0]
 3: /usr/bin/ceph-mon() [0x66bdd9]
 4: (crush_do_rule()+0x1e5) [0x66c5c5]
 5: (CrushWrapper::do_rule(int, int, std::vector<int, std::allocator<int> >&, int, std::vector<unsigned int, std::allocator<unsigned int> > const&) const+0x7a) [0x5a935a]
 6: (CrushTester::test()+0xc60) [0x5a4640]
 7: (OSDMonitor::prepare_command(MMonCommand*)+0xae5) [0x507225]
 8: (OSDMonitor::prepare_update(PaxosServiceMessage*)+0x1fb) [0x51008b]
 9: (PaxosService::dispatch(PaxosServiceMessage*)+0x969) [0x4e8439]
 10: (Monitor::handle_command(MMonCommand*)+0x40a) [0x4aa02a]
 11: (Monitor::_ms_dispatch(Message*)+0xc23) [0x4b79a3]
 12: (Monitor::handle_forward(MForward*)+0x749) [0x4b87a9]
 13: (Monitor::_ms_dispatch(Message*)+0xed3) [0x4b7c53]
 14: (Monitor::ms_dispatch(Message*)+0x32) [0x4d1bf2]
 15: (DispatchQueue::entry()+0x3f1) [0x6abc71]
 16: (DispatchQueue::DispatchThread::entry()+0xd) [0x639a8d]
 17: (()+0x7e9a) [0x7f3922bb7e9a]
 18: (clone()+0x6d) [0x7f3921852ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

This is a lot better than propagating a bad map to all the OSDs, but it would be even cooler if it caught these sorts of errors and returned a "no" to the user so that it doesn't go around and crash all the monitors repeatedly.

History

#1 Updated by Joao Eduardo Luis about 2 years ago

  • Project changed from Ceph to RADOS
  • Category changed from Monitor to Correctness/Safety
  • Status changed from New to Resolved
  • Component(RADOS) Monitor added

Resolved at some point by using external crushtool to validate crushmaps.

Also available in: Atom PDF