Project

General

Profile

Actions

Bug #21777

open

src/mds/MDCache.cc: 4332: FAILED assert(mds->is_rejoin())

Added by Patrick Donnelly over 6 years ago. Updated over 3 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash, multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

MDS may send a MMDSCacheRejoin(MMDSCacheRejoin::OP_WEAK) message to an MDS which is not rejoin/active/stopping. Once that MDS receives the message it will fail:

 ceph version 12.2.1-2.el7cp (965390e1785cd23ffde159014f25f9490e479668) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55e9f4316390]
 2: (MDCache::handle_cache_rejoin_weak(MMDSCacheRejoin*)+0x16cf) [0x55e9f410ae9f]
 3: (MDCache::handle_cache_rejoin(MMDSCacheRejoin*)+0x24b) [0x55e9f410f75b]
 4: (MDCache::dispatch(Message*)+0xa5) [0x55e9f4114d85]
 5: (MDSRank::handle_deferrable_message(Message*)+0x5c4) [0x55e9f3ffd624]
 6: (MDSRank::_dispatch(Message*, bool)+0x1e3) [0x55e9f400af13]
 7: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x55e9f400bd55]
 8: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x55e9f3ff4f33]
 9: (DispatchQueue::entry()+0x792) [0x55e9f45f9c12]
 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x55e9f439c5fd]
 11: (()+0x7e25) [0x7fd43d712e25]
 12: (clone()+0x6d) [0x7fd43c7f534d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Version-Release number of selected component (if applicable):
ceph version 12.2.1-2.el7cp (965390e1785cd23ffde159014f25f9490e479668) luminous (stable)

The MDS should be more tolerant of these messages when it's not active.


Files

mds_assert_rejoin.rar (404 KB) mds_assert_rejoin.rar haitao chen, 08/28/2020 07:55 AM

Related issues 1 (0 open1 closed)

Related to CephFS - Bug #23826: mds: assert after daemon restartDuplicatePatrick Donnelly04/23/2018

Actions
Actions

Also available in: Atom PDF