Bug #11218
closedAssertion on MDS rank `in` but without instance
0%
Description
Unintended consequence of an MDS in 'damaged' state: a peer in a multi-MDS environment is confused that the MDS is 'in' but does not have an associated daemon. Shown up by test_journal_repair, which will also need updating to not expect a crash.
2015-03-23 16:37:28.125925 7f92d77a9700 -1 mds/MDSMap.h: In function 'const entity_inst_t MDSMap::get_inst(mds_rank_t)' thread 7f92d77a9700 time 2015-03-23 16:37:28.122743 mds/MDSMap.h: 559: FAILED assert(up.count(m)) ceph version 0.93-776-g0a3e47d (0a3e47d778b457ae878024f95f610b0a8c2fb490) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0x97db2f] 2: (MDBalancer::send_heartbeat()+0x163f) [0x7641ef] 3: (MDBalancer::tick()+0x22a) [0x76cbaa] 4: (MDS::tick()+0x364) [0x5b4444] 5: (MDSInternalContextBase::complete(int)+0x1db) [0x7f1b6b] 6: (SafeTimer::timer_thread()+0x3e5) [0x96f615] 7: (SafeTimerThread::entry()+0xd) [0x9701ad] 8: (()+0x7e9a) [0x7f92df97ae9a] 9: (clone()+0x6d) [0x7f92de3412ed] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Greg Farnum about 9 years ago
Hmm. Do we need to keep the damaged ranks as members of the "up" set, or could we do something simple to remove them from that? I like these asserts since a damaged MDS isn't participating and so isn't really "up".
Updated by John Spray about 9 years ago
- Status changed from In Progress to Fix Under Review
It turns out it was already find for an MDS to be 'in' but have no inst (it's the case when we do "ceph mds fail"), but it was supposed to be impossible in this fn because of an is_degraded() check at the start, and that check wasn't checking damaged)
Updated by Greg Farnum about 9 years ago
- Status changed from Fix Under Review to Resolved
Merged to master in commit:bd1d11f6eb8225c996bfc7ca00a2083cb9423b51