Bug #21722
closedmds: no assertion on inode being purging in find_ino_peers()
0%
Description
Recently we hit an assertion on MDS only few times when MDS was very busy.
2017-09-25 11:18:49.474327 7f4810f12700 -1 mds/MDCache.cc: In function 'void MDCache::find_ino_peers(inodeno_t, MDSInternalContextBase*, mds_rank_t)' thread 7f4810f12700 time 2017-09-25 11:18:49.472228 mds/MDCache.cc: 8788: FAILED assert(!have_inode(ino)) ceph version 10.2.7-g6b4c97f (6b4c97f2df734729efcd150d988b9727acea7d5b) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f4816d64ed5] 2: (MDCache::find_ino_peers(inodeno_t, MDSInternalContextBase*, int)+0x296) [0x7f4816a36be6] 3: (Server::rdlock_path_pin_ref(std::shared_ptr<MDRequestImpl>&, int, std::set<SimpleLock*, std::less<SimpleLock*>, std::allocator<SimpleLock*> >&, bool, bool, file_layout_t**, bool)+0x785) [0x7f48169b81e5] 4: (Server::handle_client_getattr(std::shared_ptr<MDRequestImpl>&, bool)+0x160) [0x7f48169b87b0] 5: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0xb28) [0x7f48169eaac8] 6: (Server::handle_client_request(MClientRequest*)+0x808) [0x7f48169eb368] 7: (Server::dispatch(Message*)+0x3eb) [0x7f48169ef7db] 8: (MDSRank::handle_deferrable_message(Message*)+0x82f) [0x7f481696dddf] 9: (MDSRank::_dispatch(Message*, bool)+0x207) [0x7f48169784d7] 10: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7f4816979675] 11: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7f481695ec83] 12: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x7f4816ed4d67] 13: (C_handle_dispatch::do_request(int)+0x11) [0x7f4816ed54b1] 14: (EventCenter::process_events(int)+0x90a) [0x7f4816e75f9a] 15: (Worker::entry()+0x1f0) [0x7f4816e4b6c0] 16: (()+0x7dc5) [0x7f4815b3ddc5] 17: (clone()+0x6d) [0x7f481460974d]
We find that MDCache::path_traverse() will return ESTALE for 2 conditions: either inode is not found in inode_map, or inode is found but its state is purging. Then MDS will try to find this inode on peers in MDCache::find_ino_peers(), but there is an assertion only checking inode should not be in inode_map, not considering another condition that inode state is purging. And for those inodes being purging, it will be removed from inode_map after purged from stray and logged in StrayManager::_purge_stray_logged().
So we think in MDCache::find_ino_peers(), we could skip those inodes being purging in assertion check and let them go through to check on MDS peers, because finally MDS will still return ESTALE to client for those inodes.
Updated by Patrick Donnelly over 6 years ago
- Status changed from New to Pending Backport
- Backport set to luminous
Updated by Nathan Cutler over 6 years ago
- Copied to Backport #21952: luminous: mds: no assertion on inode being purging in find_ino_peers() added
Updated by Patrick Donnelly about 6 years ago
- Status changed from Pending Backport to Resolved