Project

General

Profile

Actions

Bug #1534

closed

ceph tool failed assert(mon_addr.count(n))

Added by Josh Durgin over 12 years ago. Updated over 12 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

It seems like this should be something that results in an error message, not a crash. Backtrace:

2011-09-09T16:40:19.129 INFO:orchestra.run.err:mon/MonMap.h: In function 'entity_inst_t MonMap::get_inst(const std::string&)', in thread '0x7f3c35b60720'
2011-09-09T16:40:19.129 INFO:orchestra.run.err:mon/MonMap.h: 131: FAILED assert(mon_addr.count(n))
2011-09-09T16:40:19.156 INFO:orchestra.run.err: ceph version 0.34-482-g9713666 (commit:9713666c27412a121de4ffefbcfb184af238431d)
2011-09-09T16:40:19.156 INFO:orchestra.run.err: 1: (MonClient::_send_mon_message(Message*, bool)+0x1b2) [0x496cf2]
2011-09-09T16:40:19.157 INFO:orchestra.run.err: 2: (send_observe_requests(CephToolCtx*)+0x1cf) [0x44b0af]
2011-09-09T16:40:19.157 INFO:orchestra.run.err: 3: (main()+0x62e) [0x4498de]
2011-09-09T16:40:19.157 INFO:orchestra.run.err: 4: (__libc_start_main()+0xfe) [0x7f3c34141d8e]
2011-09-09T16:40:19.157 INFO:orchestra.run.err: 5: /tmp/cephtest/binary/usr/local/bin/ceph() [0x448a29]
2011-09-09T16:40:19.157 INFO:orchestra.run.err: ceph version 0.34-482-g9713666 (commit:9713666c27412a121de4ffefbcfb184af238431d)
2011-09-09T16:40:19.157 INFO:orchestra.run.err: 1: (MonClient::_send_mon_message(Message*, bool)+0x1b2) [0x496cf2]
2011-09-09T16:40:19.157 INFO:orchestra.run.err: 2: (send_observe_requests(CephToolCtx*)+0x1cf) [0x44b0af]
2011-09-09T16:40:19.157 INFO:orchestra.run.err: 3: (main()+0x62e) [0x4498de]
2011-09-09T16:40:19.158 INFO:orchestra.run.err: 4: (__libc_start_main()+0xfe) [0x7f3c34141d8e]
2011-09-09T16:40:19.158 INFO:orchestra.run.err: 5: /tmp/cephtest/binary/usr/local/bin/ceph() [0x448a29]
2011-09-09T16:40:19.158 INFO:orchestra.run.err:terminate called after throwing an instance of 'ceph::FailedAssertion'

Logs from monitors and osds are in vit:~joshd/thrash_ceph_crash. I accidentally overwrote the client log with a manual 'ceph -w'.

Actions #1

Updated by Sage Weil over 12 years ago

It looks more like a bug to me.. MonClient should never call get_inst() on a mon that doesn't exist. Any ideas how to reproduce it?

Actions #2

Updated by Josh Durgin over 12 years ago

This was during a teuthology run of thrashosds with the bonnie workunit on cfuse. The 'ceph -s' that crashed was being run by the thrashosds task. I had debugging on the osds, but not the mons, I think. The config is with the logs.

Actions #3

Updated by Sage Weil over 12 years ago

Ok. I really can't see how we got into that state (where cur_mon wasn't in the monmap). It's also weird because that assert shouldn't have happened unless debug_monclient was turned up... It could have happened if the monmap changed, but that wasn't the case here.

If you see this again, we need to turn up debug_monclient and set log_file for those ceph status checks. Let's make sure we can reproduce it first, though.

Actions #4

Updated by Sage Weil over 12 years ago

  • Target version changed from v0.36 to v0.37
Actions #5

Updated by Sage Weil over 12 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF