Project

General

Profile

Actions

Bug #7931

closed

setcrushmap crashing monitor

Added by Luis Periquito about 10 years ago. Updated about 10 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Following the guides I've created a new crushmap. When I submit this new crushmap the monitor crashes with some information in the log files. To test the behaviour I've extracted the running binary using ceph getcrushmap and then doing a ceph setcrushmap and it also crashed the monitors.

I've done this test on our test cluster, but that one did work as expected. The crashed cluster is the production one.

This cluster was originally installed with bobtail, upgraded to cuttlefish (0.61.4) and then to emperor (0.72.2).

ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
1: /usr/bin/ceph-mon() [0x8115da]
2: (()+0xfcb0) [0x7f2b394b7cb0]
3: /usr/bin/ceph-mon() [0x7a38a9]
4: (crush_do_rule()+0x1e5) [0x7a4095]
5: (CrushWrapper::do_rule(int, int, std::vector<int, std::allocator<int> >&, int, std::vector<unsigned int, std::allocator<unsigned int> > const&) const+0x7a) [0x67a79a]
6: (CrushTester::test()+0xc60) [0x675dd0]
7: (OSDMonitor::prepare_command(MMonCommand*)+0x9d5) [0x5b8505]
8: (OSDMonitor::prepare_update(PaxosServiceMessage*)+0x21b) [0x5c304b]
9: (PaxosService::dispatch(PaxosServiceMessage*)+0x97f) [0x5916bf]
10: (Context::complete(int)+0x9) [0x566269]
11: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x95) [0x568b85]
12: (Paxos::handle_last(MMonPaxos*)+0xd7a) [0x58d02a]
13: (Paxos::dispatch(PaxosServiceMessage*)+0x29b) [0x58d65b]
14: (Monitor::dispatch(MonSession*, Message*, bool)+0x558) [0x566018]
15: (Monitor::_ms_dispatch(Message*)+0x204) [0x564124]
16: (Monitor::ms_dispatch(Message*)+0x32) [0x57ea82]
17: (DispatchQueue::entry()+0x549) [0x7e8c29]
18: (DispatchQueue::DispatchThread::entry()+0xd) [0x7159bd]
19: (()+0x7e9a) [0x7f2b394afe9a]
20: (clone()+0x6d) [0x7f2b38157cbd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
10/10 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/internal-mon.ec02sv13.log
--
end dump of recent events ---

Actions #1

Updated by Greg Farnum about 10 years ago

You should do this with "debug mon = 20" set, but it appears to be crashing because your crush map is somehow invalid and the sanity tests are noticing it and failing. It's certainly not the friendliest failure mode but is lots better than letting a bad map in to the cluster!

Actions #2

Updated by Luis Periquito about 10 years ago

Hi Greg,

I don't think that was the issue, however it has since been working. The crushmap I've uploaded was the running one.

You may proceed to archive the bug as I can't replicate it anymore.

thanks,
Luis

Actions #3

Updated by Greg Farnum about 10 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF