Bug #7931: setcrushmap crashing monitor - Ceph - Ceph

Actions

Copy link

Bug #7931

closed

setcrushmap crashing monitor

Added by Luis Periquito about 10 years ago. Updated about 10 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

Monitor

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Following the guides I've created a new crushmap. When I submit this new crushmap the monitor crashes with some information in the log files. To test the behaviour I've extracted the running binary using ceph getcrushmap and then doing a ceph setcrushmap and it also crashed the monitors.

I've done this test on our test cluster, but that one did work as expected. The crashed cluster is the production one.

This cluster was originally installed with bobtail, upgraded to cuttlefish (0.61.4) and then to emperor (0.72.2).

ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
 1: /usr/bin/ceph-mon() [0x8115da]
 2: (()+0xfcb0) [0x7f2b394b7cb0]
 3: /usr/bin/ceph-mon() [0x7a38a9]
 4: (crush_do_rule()+0x1e5) [0x7a4095]
 5: (CrushWrapper::do_rule(int, int, std::vector&lt;int, std::allocator&lt;int&gt; >&, int, std::vector&lt;unsigned int, std::allocator&lt;unsigned int&gt; > const&) const+0x7a) [0x67a79a]
 6: (CrushTester::test()+0xc60) [0x675dd0]
 7: (OSDMonitor::prepare_command(MMonCommand*)+0x9d5) [0x5b8505]
 8: (OSDMonitor::prepare_update(PaxosServiceMessage*)+0x21b) [0x5c304b]
 9: (PaxosService::dispatch(PaxosServiceMessage*)+0x97f) [0x5916bf]
 10: (Context::complete(int)+0x9) [0x566269]
 11: (finish_contexts(CephContext*, std::list&lt;Context*, std::allocator&lt;Context*&gt; >&, int)+0x95) [0x568b85]
 12: (Paxos::handle_last(MMonPaxos*)+0xd7a) [0x58d02a]
 13: (Paxos::dispatch(PaxosServiceMessage*)+0x29b) [0x58d65b]
 14: (Monitor::dispatch(MonSession*, Message*, bool)+0x558) [0x566018]
 15: (Monitor::_ms_dispatch(Message*)+0x204) [0x564124]
 16: (Monitor::ms_dispatch(Message*)+0x32) [0x57ea82]
 17: (DispatchQueue::entry()+0x549) [0x7e8c29]
 18: (DispatchQueue::DispatchThread::entry()+0xd) [0x7159bd]
 19: (()+0x7e9a) [0x7f2b394afe9a]
 20: (clone()+0x6d) [0x7f2b38157cbd]
 NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
10/10 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/internal-mon.ec02sv13.log
-- end dump of recent events ---

Actions

Copy link

Updated by Greg Farnum about 10 years ago

You should do this with "debug mon = 20" set, but it appears to be crashing because your crush map is somehow invalid and the sanity tests are noticing it and failing. It's certainly not the friendliest failure mode but is lots better than letting a bad map in to the cluster!

Actions

Copy link