Project

General

Profile

Actions

Bug #21722

closed

mds: no assertion on inode being purging in find_ino_peers()

Added by Zhi Zhang over 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Recently we hit an assertion on MDS only few times when MDS was very busy.

2017-09-25 11:18:49.474327 7f4810f12700 -1 mds/MDCache.cc: In function 'void MDCache::find_ino_peers(inodeno_t, MDSInternalContextBase*, mds_rank_t)' thread 7f4810f12700 time 2017-09-25 11:18:49.472228
mds/MDCache.cc: 8788: FAILED assert(!have_inode(ino))

 ceph version 10.2.7-g6b4c97f (6b4c97f2df734729efcd150d988b9727acea7d5b)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f4816d64ed5]
 2: (MDCache::find_ino_peers(inodeno_t, MDSInternalContextBase*, int)+0x296) [0x7f4816a36be6]
 3: (Server::rdlock_path_pin_ref(std::shared_ptr<MDRequestImpl>&, int, std::set<SimpleLock*, std::less<SimpleLock*>, std::allocator<SimpleLock*> >&, bool, bool, file_layout_t**, bool)+0x785) [0x7f48169b81e5]
 4: (Server::handle_client_getattr(std::shared_ptr<MDRequestImpl>&, bool)+0x160) [0x7f48169b87b0]
 5: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0xb28) [0x7f48169eaac8]
 6: (Server::handle_client_request(MClientRequest*)+0x808) [0x7f48169eb368]
 7: (Server::dispatch(Message*)+0x3eb) [0x7f48169ef7db]
 8: (MDSRank::handle_deferrable_message(Message*)+0x82f) [0x7f481696dddf]
 9: (MDSRank::_dispatch(Message*, bool)+0x207) [0x7f48169784d7]
 10: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7f4816979675]
 11: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7f481695ec83]
 12: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x7f4816ed4d67]
 13: (C_handle_dispatch::do_request(int)+0x11) [0x7f4816ed54b1]
 14: (EventCenter::process_events(int)+0x90a) [0x7f4816e75f9a]
 15: (Worker::entry()+0x1f0) [0x7f4816e4b6c0]
 16: (()+0x7dc5) [0x7f4815b3ddc5]
 17: (clone()+0x6d) [0x7f481460974d]

We find that MDCache::path_traverse() will return ESTALE for 2 conditions: either inode is not found in inode_map, or inode is found but its state is purging. Then MDS will try to find this inode on peers in MDCache::find_ino_peers(), but there is an assertion only checking inode should not be in inode_map, not considering another condition that inode state is purging. And for those inodes being purging, it will be removed from inode_map after purged from stray and logged in StrayManager::_purge_stray_logged().

So we think in MDCache::find_ino_peers(), we could skip those inodes being purging in assertion check and let them go through to check on MDS peers, because finally MDS will still return ESTALE to client for those inodes.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #21952: luminous: mds: no assertion on inode being purging in find_ino_peers()ResolvedActions
Actions #2

Updated by Patrick Donnelly over 6 years ago

  • Status changed from New to Pending Backport
  • Backport set to luminous
Actions #3

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #21952: luminous: mds: no assertion on inode being purging in find_ino_peers() added
Actions #4

Updated by Patrick Donnelly about 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF