Project

General

Profile

Bug #21722

mds: no assertion on inode being purging in find_ino_peers()

Added by Zhi Zhang over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
10/09/2017
Due date:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

Recently we hit an assertion on MDS only few times when MDS was very busy.

2017-09-25 11:18:49.474327 7f4810f12700 -1 mds/MDCache.cc: In function 'void MDCache::find_ino_peers(inodeno_t, MDSInternalContextBase*, mds_rank_t)' thread 7f4810f12700 time 2017-09-25 11:18:49.472228
mds/MDCache.cc: 8788: FAILED assert(!have_inode(ino))

 ceph version 10.2.7-g6b4c97f (6b4c97f2df734729efcd150d988b9727acea7d5b)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f4816d64ed5]
 2: (MDCache::find_ino_peers(inodeno_t, MDSInternalContextBase*, int)+0x296) [0x7f4816a36be6]
 3: (Server::rdlock_path_pin_ref(std::shared_ptr<MDRequestImpl>&, int, std::set<SimpleLock*, std::less<SimpleLock*>, std::allocator<SimpleLock*> >&, bool, bool, file_layout_t**, bool)+0x785) [0x7f48169b81e5]
 4: (Server::handle_client_getattr(std::shared_ptr<MDRequestImpl>&, bool)+0x160) [0x7f48169b87b0]
 5: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0xb28) [0x7f48169eaac8]
 6: (Server::handle_client_request(MClientRequest*)+0x808) [0x7f48169eb368]
 7: (Server::dispatch(Message*)+0x3eb) [0x7f48169ef7db]
 8: (MDSRank::handle_deferrable_message(Message*)+0x82f) [0x7f481696dddf]
 9: (MDSRank::_dispatch(Message*, bool)+0x207) [0x7f48169784d7]
 10: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7f4816979675]
 11: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7f481695ec83]
 12: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x7f4816ed4d67]
 13: (C_handle_dispatch::do_request(int)+0x11) [0x7f4816ed54b1]
 14: (EventCenter::process_events(int)+0x90a) [0x7f4816e75f9a]
 15: (Worker::entry()+0x1f0) [0x7f4816e4b6c0]
 16: (()+0x7dc5) [0x7f4815b3ddc5]
 17: (clone()+0x6d) [0x7f481460974d]

We find that MDCache::path_traverse() will return ESTALE for 2 conditions: either inode is not found in inode_map, or inode is found but its state is purging. Then MDS will try to find this inode on peers in MDCache::find_ino_peers(), but there is an assertion only checking inode should not be in inode_map, not considering another condition that inode state is purging. And for those inodes being purging, it will be removed from inode_map after purged from stray and logged in StrayManager::_purge_stray_logged().

So we think in MDCache::find_ino_peers(), we could skip those inodes being purging in assertion check and let them go through to check on MDS peers, because finally MDS will still return ESTALE to client for those inodes.


Related issues

Copied to fs - Backport #21952: luminous: mds: no assertion on inode being purging in find_ino_peers() Resolved

History

#2 Updated by Patrick Donnelly over 1 year ago

  • Status changed from New to Pending Backport
  • Backport set to luminous

#3 Updated by Nathan Cutler over 1 year ago

  • Copied to Backport #21952: luminous: mds: no assertion on inode being purging in find_ino_peers() added

#4 Updated by Patrick Donnelly over 1 year ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF