Bug #23826
closedmds: assert after daemon restart
0%
Description
/builddir/build/BUILD/ceph-12.2.1/src/mds/MDCache.cc: 5080: FAILED assert(isolated_inodes.empty()) ceph version 12.2.1-46.el7cp (b6f6f1b141c306a43f669b974971b9ec44914cb0) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x564975ec7b40] 2: (MDCache::handle_cache_rejoin_ack(MMDSCacheRejoin*)+0x25a0) [0x564975cb4e60] 3: (MDCache::handle_cache_rejoin(MMDSCacheRejoin*)+0x213) [0x564975cc12a3] 4: (MDCache::dispatch(Message*)+0xa5) [0x564975cc6905] 5: (MDSRank::handle_deferrable_message(Message*)+0x5c4) [0x564975baf734] 6: (MDSRank::_dispatch(Message*, bool)+0x1e3) [0x564975bbcd43] 7: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x564975bbdb85] 8: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x564975ba7023] 9: (DispatchQueue::entry()+0x792) [0x5649761ab952] 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x564975f4dfbd] 11: (()+0x7dd5) [0x7f577c615dd5] 12: (clone()+0x6d) [0x7f577b6f5b3d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Files
Updated by Patrick Donnelly almost 6 years ago
- File ceph-mds.magna058.log.gz ceph-mds.magna058.log.gz added
Adding log from failed MDS.
Looks like it's receiving handle_cache_rejoin_ack message while in replay.
Updated by Patrick Donnelly almost 6 years ago
- Related to Bug #21777: src/mds/MDCache.cc: 4332: FAILED assert(mds->is_rejoin()) added
Updated by Patrick Donnelly almost 6 years ago
Here's one possible way this could happen I think:
1. All MDS are rejoin or later.
2. A up:rejoin MDS does:
3. handle_mds_map
4. MDCache::rejoin_start
5. MDCache::process_imported_caps
6. open_ino(p->first, (int64_t)-1, new C_MDC_RejoinOpenInoFinish(this, p->first), false);
7. finisher calls mdcache->rejoin_open_ino_finish(ino, r);
8. MDCache::rejoin_gather_finish();
9. MDCache::rejoin_send_acks(); which sends the ACKs
Which will send the ACKs. I don't see this protected anywhere by MDSMap::is_rejoining().
Updated by Zheng Yan almost 6 years ago
checking MDSMap::is_rejoining() is not required here. If there are recovering mds which haven't entered rejoin state. MDCache::rejoin_gether set can not be empty.
Updated by Patrick Donnelly almost 6 years ago
- Priority changed from High to Urgent
Updated by Patrick Donnelly almost 6 years ago
- Assignee changed from Zheng Yan to Patrick Donnelly
- Target version changed from v13.0.0 to v13.2.0
Updated by Zheng Yan almost 6 years ago
Finish context of MDCache::open_undef_inodes_dirfrags() calls rejoin_gather_finish() without check rejoin_gather. I think it can explain this crash.
https://github.com/ceph/ceph/pull/21883/commits/0a38a499b86c0ee13aa0e783a8359bcce0876088
Updated by Zheng Yan almost 6 years ago
- Status changed from New to Duplicate
Checked again, it's likely fixed by https://github.com/ceph/ceph/pull/21883/commits/0a38a499b86c0ee13aa0e783a8359bcce0876088
Updated by Patrick Donnelly almost 6 years ago
- Is duplicate of Bug #24047: MDCache.cc: 5317: FAILED assert(mds->is_rejoin()) added