Bug #22610
MDS: assert failure when the inode for the cap_export from other MDS happened not in MDCache
0%
Description
We use two active MDS in our online environment, recently mds.1 restarted and during its rejoin phase, mds.0 met assert failure when processing the weak rejoin request from mds.1, below is the log snip:
-2> 2018-01-04 20:50:50.638943 7f9fb9cfb700 5 mds.mmcommcephsz11 handle_mds_map epoch 694747 from mds.1
-1> 2018-01-04 20:50:50.638952 7f9fb9cfb700 5 mds.mmcommcephsz11 old map epoch 694747 <= 694747, discarding
0> 2018-01-04 20:50:50.652715 7f9fb9cfb700 -1 mds/MDCache.cc: In function 'void MDCache::handle_cache_rejoin_weak(MMDSCacheRejoin*)' thread 7f9fb9cfb700 time 2018-01-04 20:50:50.650286
mds/MDCache.cc: 4325: FAILED assert(in && in->is_auth())
ceph version 10.2.9-102-g820619c (820619cc59a3790ab36be1945a135eb826c558f1)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f9fc0981205]
2: (MDCache::handle_cache_rejoin_weak(MMDSCacheRejoin*)+0x63e) [0x7f9fc068759e]
3: (MDCache::handle_cache_rejoin(MMDSCacheRejoin*)+0x25b) [0x7f9fc068d04b]
4: (MDCache::dispatch(Message*)+0xa5) [0x7f9fc069d975]
5: (MDSRank::handle_deferrable_message(Message*)+0x5ef) [0x7f9fc058792f]
6: (MDSRank::_dispatch(Message*, bool)+0x1e0) [0x7f9fc0592250]
7: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x7f9fc05933e5]
8: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x7f9fc0578a03]
9: (Messenger::ms_deliver_dispatch(Message*)+0x77) [0x7f9fc0af2017]
10: (C_handle_dispatch::do_request(int)+0x11) [0x7f9fc0af2761]
11: (EventCenter::process_events(int)+0x90a) [0x7f9fc0a92aba]
12: (Worker::entry()+0x1f0) [0x7f9fc0a68170]
13: (()+0x7dc5) [0x7f9fbf755dc5]
14: (clone()+0x6d) [0x7f9fbe22129d]
After checking the related code, it seems that the assert(in && in->is_auth()) is too strict, because the inode for this cap_export maybe expired from Cache, and change the assert into assert(!in || in->is_auth) is more reasonable.
Related issues
History
#1 Updated by Jianyu Li about 6 years ago
Fire a pull request: https://github.com/ceph/ceph/pull/19836
#2 Updated by Patrick Donnelly about 6 years ago
- Status changed from New to In Progress
- Assignee set to Jianyu Li
#3 Updated by Zheng Yan about 6 years ago
- Status changed from In Progress to Fix Under Review
#4 Updated by Patrick Donnelly about 6 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to jewel,luminous
#5 Updated by Nathan Cutler about 6 years ago
- Copied to Backport #22867: luminous: MDS: assert failure when the inode for the cap_export from other MDS happened not in MDCache added
#6 Updated by Nathan Cutler about 6 years ago
- Copied to Backport #22868: jewel: MDS: assert failure when the inode for the cap_export from other MDS happened not in MDCache added
#7 Updated by Patrick Donnelly about 6 years ago
- Backport changed from jewel,luminous to luminous
#8 Updated by Nathan Cutler about 6 years ago
- Backport changed from luminous to luminous jewel
Re-adding rejected jewel backport to appease backport tooling.
#9 Updated by Nathan Cutler about 6 years ago
- Status changed from Pending Backport to Resolved