Actions
Bug #21777
opensrc/mds/MDCache.cc: 4332: FAILED assert(mds->is_rejoin())
Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash, multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
MDS may send a MMDSCacheRejoin(MMDSCacheRejoin::OP_WEAK) message to an MDS which is not rejoin/active/stopping. Once that MDS receives the message it will fail:
ceph version 12.2.1-2.el7cp (965390e1785cd23ffde159014f25f9490e479668) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55e9f4316390] 2: (MDCache::handle_cache_rejoin_weak(MMDSCacheRejoin*)+0x16cf) [0x55e9f410ae9f] 3: (MDCache::handle_cache_rejoin(MMDSCacheRejoin*)+0x24b) [0x55e9f410f75b] 4: (MDCache::dispatch(Message*)+0xa5) [0x55e9f4114d85] 5: (MDSRank::handle_deferrable_message(Message*)+0x5c4) [0x55e9f3ffd624] 6: (MDSRank::_dispatch(Message*, bool)+0x1e3) [0x55e9f400af13] 7: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x55e9f400bd55] 8: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x55e9f3ff4f33] 9: (DispatchQueue::entry()+0x792) [0x55e9f45f9c12] 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x55e9f439c5fd] 11: (()+0x7e25) [0x7fd43d712e25] 12: (clone()+0x6d) [0x7fd43c7f534d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Version-Release number of selected component (if applicable): ceph version 12.2.1-2.el7cp (965390e1785cd23ffde159014f25f9490e479668) luminous (stable)
The MDS should be more tolerant of these messages when it's not active.
Files
Actions