Actions
Bug #23658
closedMDSMonitor: crash after assigning standby-replay daemon in multifs setup
Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Correctness/Safety
Target version:
% Done:
0%
Source:
Community (dev)
Tags:
Backport:
luminous,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor, qa-suite
Labels (FS):
crash, multifs
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
From: https://github.com/rook/rook/issues/1027
2017-09-29 21:55:06.978169 I | rook-ceph-mon0: 0> 2017-09-29 21:55:06.961413 7f55aba29700 -1 /build/ceph/src/mds/FSMap.cc: In function 'void FSMap::assign_standby_replay(mds_gid_t, fs_cluster_id_t, mds_rank_t)' thread 7f55aba29700 time 2017-09-29 21:55:06.957486 2017-09-29 21:55:06.978179 I | rook-ceph-mon0: /build/ceph/src/mds/FSMap.cc: 870: FAILED assert(mds_roles.at(standby_gid) == FS_CLUSTER_ID_NONE) It would appear there is an issue with the standby being assigned by the mon after adding a third filesystem. The configuration of the file systems in the cluster was: myfs: two mds active, two mds on standby-replay yourfs: three mds active, three mds on standby jaredsfs: one mds active, one mds on standby-replay After the first two were created, ceph status showed the following mds status: mds: myfs-2/2/2 up yourfs-3/3/3 up {[myfs:0]=msdfdx=up:active,[myfs:1]=m88104=up:active,[yourfs:0]=m739m0=up:active,[yourfs:1]=mdv8k2=up:active,[yourfs:2]=m6ktsw=up:active}, 2 up:standby-replay, 3 up:standby
Updated by Patrick Donnelly about 6 years ago
- Blocks Feature #22477: multifs: remove multifs experimental warnings added
Updated by Patrick Donnelly about 6 years ago
- Priority changed from Normal to Urgent
Updated by Zheng Yan about 6 years ago
- Backport set to luminous, jewel
Updated by Zheng Yan about 6 years ago
- Status changed from New to Fix Under Review
Updated by Patrick Donnelly almost 6 years ago
- Assignee set to Zheng Yan
- Target version changed from v14.0.0 to v13.0.0
Updated by Patrick Donnelly almost 6 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport changed from luminous, jewel to luminous,jewel
Updated by Nathan Cutler almost 6 years ago
- Copied to Backport #23833: luminous: MDSMonitor: crash after assigning standby-replay daemon in multifs setup added
Updated by Nathan Cutler almost 6 years ago
- Copied to Backport #23834: jewel: MDSMonitor: crash after assigning standby-replay daemon in multifs setup added
Updated by Travis Nielsen almost 6 years ago
When this issue hits, is there a way to recover? For example, to forcefully remove the multiple filesystems that are causing the crash. With the mons crashing, the cluster is just down.
Updated by Patrick Donnelly over 5 years ago
- Status changed from Pending Backport to Resolved
Actions