Bug #1534
closedceph tool failed assert(mon_addr.count(n))
0%
Description
It seems like this should be something that results in an error message, not a crash. Backtrace:
2011-09-09T16:40:19.129 INFO:orchestra.run.err:mon/MonMap.h: In function 'entity_inst_t MonMap::get_inst(const std::string&)', in thread '0x7f3c35b60720' 2011-09-09T16:40:19.129 INFO:orchestra.run.err:mon/MonMap.h: 131: FAILED assert(mon_addr.count(n)) 2011-09-09T16:40:19.156 INFO:orchestra.run.err: ceph version 0.34-482-g9713666 (commit:9713666c27412a121de4ffefbcfb184af238431d) 2011-09-09T16:40:19.156 INFO:orchestra.run.err: 1: (MonClient::_send_mon_message(Message*, bool)+0x1b2) [0x496cf2] 2011-09-09T16:40:19.157 INFO:orchestra.run.err: 2: (send_observe_requests(CephToolCtx*)+0x1cf) [0x44b0af] 2011-09-09T16:40:19.157 INFO:orchestra.run.err: 3: (main()+0x62e) [0x4498de] 2011-09-09T16:40:19.157 INFO:orchestra.run.err: 4: (__libc_start_main()+0xfe) [0x7f3c34141d8e] 2011-09-09T16:40:19.157 INFO:orchestra.run.err: 5: /tmp/cephtest/binary/usr/local/bin/ceph() [0x448a29] 2011-09-09T16:40:19.157 INFO:orchestra.run.err: ceph version 0.34-482-g9713666 (commit:9713666c27412a121de4ffefbcfb184af238431d) 2011-09-09T16:40:19.157 INFO:orchestra.run.err: 1: (MonClient::_send_mon_message(Message*, bool)+0x1b2) [0x496cf2] 2011-09-09T16:40:19.157 INFO:orchestra.run.err: 2: (send_observe_requests(CephToolCtx*)+0x1cf) [0x44b0af] 2011-09-09T16:40:19.157 INFO:orchestra.run.err: 3: (main()+0x62e) [0x4498de] 2011-09-09T16:40:19.158 INFO:orchestra.run.err: 4: (__libc_start_main()+0xfe) [0x7f3c34141d8e] 2011-09-09T16:40:19.158 INFO:orchestra.run.err: 5: /tmp/cephtest/binary/usr/local/bin/ceph() [0x448a29] 2011-09-09T16:40:19.158 INFO:orchestra.run.err:terminate called after throwing an instance of 'ceph::FailedAssertion'
Logs from monitors and osds are in vit:~joshd/thrash_ceph_crash. I accidentally overwrote the client log with a manual 'ceph -w'.
Updated by Sage Weil over 12 years ago
It looks more like a bug to me.. MonClient should never call get_inst() on a mon that doesn't exist. Any ideas how to reproduce it?
Updated by Josh Durgin over 12 years ago
This was during a teuthology run of thrashosds with the bonnie workunit on cfuse. The 'ceph -s' that crashed was being run by the thrashosds task. I had debugging on the osds, but not the mons, I think. The config is with the logs.
Updated by Sage Weil over 12 years ago
Ok. I really can't see how we got into that state (where cur_mon wasn't in the monmap). It's also weird because that assert shouldn't have happened unless debug_monclient was turned up... It could have happened if the monmap changed, but that wasn't the case here.
If you see this again, we need to turn up debug_monclient and set log_file for those ceph status checks. Let's make sure we can reproduce it first, though.
Updated by Sage Weil over 12 years ago
- Target version changed from v0.36 to v0.37
Updated by Sage Weil over 12 years ago
- Status changed from New to Can't reproduce