Project

General

Profile

Actions

Bug #23658

closed

MDSMonitor: crash after assigning standby-replay daemon in multifs setup

Added by Patrick Donnelly about 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
luminous,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor, qa-suite
Labels (FS):
crash, multifs
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From: https://github.com/rook/rook/issues/1027

2017-09-29 21:55:06.978169 I | rook-ceph-mon0:      0> 2017-09-29 21:55:06.961413 7f55aba29700 -1 /build/ceph/src/mds/FSMap.cc: In function 'void FSMap::assign_standby_replay(mds_gid_t, fs_cluster_id_t, mds_rank_t)' thread 7f55aba29700 time 2017-09-29 21:55:06.957486
2017-09-29 21:55:06.978179 I | rook-ceph-mon0: /build/ceph/src/mds/FSMap.cc: 870: FAILED assert(mds_roles.at(standby_gid) == FS_CLUSTER_ID_NONE)

It would appear there is an issue with the standby being assigned by the mon after adding a third filesystem. The configuration of the file systems in the cluster was:

    myfs: two mds active, two mds on standby-replay
    yourfs: three mds active, three mds on standby
    jaredsfs: one mds active, one mds on standby-replay

After the first two were created, ceph status showed the following mds status:

 mds: myfs-2/2/2 up yourfs-3/3/3 up  {[myfs:0]=msdfdx=up:active,[myfs:1]=m88104=up:active,[yourfs:0]=m739m0=up:active,[yourfs:1]=mdv8k2=up:active,[yourfs:2]=m6ktsw=up:active}, 2 up:standby-replay, 3 up:standby


Related issues 3 (0 open3 closed)

Blocks CephFS - Feature #22477: multifs: remove multifs experimental warningsResolvedPatrick Donnelly

Actions
Copied to CephFS - Backport #23833: luminous: MDSMonitor: crash after assigning standby-replay daemon in multifs setupResolvedPatrick DonnellyActions
Copied to CephFS - Backport #23834: jewel: MDSMonitor: crash after assigning standby-replay daemon in multifs setupRejectedActions
Actions #1

Updated by Patrick Donnelly about 6 years ago

  • Labels (FS) crash added
Actions #2

Updated by Patrick Donnelly about 6 years ago

  • Blocks Feature #22477: multifs: remove multifs experimental warnings added
Actions #3

Updated by Patrick Donnelly about 6 years ago

  • Priority changed from Normal to Urgent
Actions #4

Updated by Zheng Yan about 6 years ago

  • Backport set to luminous, jewel
Actions #5

Updated by Zheng Yan about 6 years ago

  • Status changed from New to Fix Under Review
Actions #6

Updated by Patrick Donnelly almost 6 years ago

  • Assignee set to Zheng Yan
  • Target version changed from v14.0.0 to v13.0.0
Actions #7

Updated by Patrick Donnelly almost 6 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from luminous, jewel to luminous,jewel
Actions #8

Updated by Nathan Cutler almost 6 years ago

  • Copied to Backport #23833: luminous: MDSMonitor: crash after assigning standby-replay daemon in multifs setup added
Actions #9

Updated by Nathan Cutler almost 6 years ago

  • Copied to Backport #23834: jewel: MDSMonitor: crash after assigning standby-replay daemon in multifs setup added
Actions #10

Updated by Travis Nielsen almost 6 years ago

When this issue hits, is there a way to recover? For example, to forcefully remove the multiple filesystems that are causing the crash. With the mons crashing, the cluster is just down.

Actions #11

Updated by Patrick Donnelly over 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF