Project

General

Profile

Actions

Bug #52094

closed

Tried out Quincy: All MDS Standby

Added by Joshua West over 2 years ago. Updated over 2 years ago.

Status:
Duplicate
Priority:
Normal
Category:
Testing
Target version:
% Done:

0%

Source:
Development
Tags:
mds standy, quincy
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, MDSMonitor, mgr/mds_autoscaler
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On Proxmox, and suffering with #51445 (https://tracker.ceph.com/issues/51445)
As any good "Knows enough to be dangerous" sorta folk, I attempted to implement (https://github.com/ceph/ceph/pull/42345) but foolishly learned a lesson about accidentally upgrading oneself to the dev branch.


ceph version 17.0.0-6673-g313be835f7a (313be835f7a5eb5b2e43365d044bf20fd3fd1b2d) quincy (dev)

For the most part, things went smooth, but I beleive I may have identified a bug with regard to none of my four MDS coming out of standby.
In a perfect world, I would get past the bug(?), or figure out how to revert to pacific without the mons being unable to start due to "changes to the on disk structure".
health: HEALTH_ERR 1 filesystem is degraded 1 filesystem has a failed mds daemon 2 large omap objects 1 filesystem is offline

/var/log/ceph/ceph-mon.server.log:flags 32 joinable allow_snaps allow_multimds_snaps allow_standby_replay /var/log/ceph/ceph-mon.server.log:compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} /var/log/ceph/ceph-mon.server.log:max_mds 1 /var/log/ceph/ceph-mon.server.log:[mds.rd240{-1:136865003} state up:standby seq 1 addr [v2:192.168.2.20:6864/2702233770,v1:192.168.2.20:6865/2702233770] compat {c=[1],r=[1],i=[7ff]}] /var/log/ceph/ceph-mon.server.log:[mds.server{ffffffff:8337f9f} state up:standby seq 1 addr [v2:192.168.2.2:1a90/7b467d7,v1:192.168.2.2:1a91/7b467d7] compat {c=[1],r=[1],i=[7ff]}] /var/log/ceph/ceph-mon.server.log:[mds.dl380g7{ffffffff:8338b3b} state up:standby seq 1 addr [v2:192.168.2.4:1a90/514c93ee,v1:192.168.2.4:1a91/514c93ee] compat {c=[1],r=[1],i=[7ff]}] /var/log/ceph/ceph-mon.server.log:[mds.rog{ffffffff:8338ebf} state up:standby seq 1 addr [v2:192.168.2.6:1a90/5b0fbd32,v1:192.168.2.6:1a91/5b0fbd32] compat {c=[1],r=[1],i=[7ff]}]

mds logs attached for review, but this is the first issue I've opened (at any tracker), so please let me know if I've missed key details or anything which I should've included (My apologies if this is the case.)
Marked as minor due to this being Quincy, where issues are expected during dev.


Files

ceph-mds.rog.log (854 KB) ceph-mds.rog.log Joshua West, 08/07/2021 12:05 AM
ceph-mds.rog.log2.gz (101 KB) ceph-mds.rog.log2.gz Joshua West, 08/07/2021 10:41 AM

Related issues 1 (0 open1 closed)

Is duplicate of CephFS - Bug #52975: MDSMonitor: no active MDS after cluster deploymentResolvedPatrick Donnelly

Actions
Actions #1

Updated by Joshua West over 2 years ago


e1095528
enable_multiple, ever_enabled_multiple: 1,1
default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
legacy client fscid: 7

Filesystem 'cephfs' (7)
fs_name    cephfs
epoch    1095528
flags    32 joinable allow_snaps allow_multimds_snaps allow_standby_replay
created    2020-10-27T10:52:01.629171-0600
modified    2021-08-08T09:34:02.509077-0600
tableserver    0
root    0
session_timeout    60
session_autoclose    300
max_file_size    5000000000000
required_client_features    {8=mimic}
last_failure    0
last_failure_osd_epoch    1093846
compat    compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds    2
in    0,1
up    {}
failed    0,1
damaged    
stopped    2
data_pools    [21]
metadata_pool    22
inline_data    disabled
balancer    
standby_count_wanted    1

Standby daemons:

[mds.server{-1:138437405} state up:standby seq 1 join_fscid=7 addr [v2:192.168.2.2:6808/2491038724,v1:192.168.2.2:6809/2491038724] compat {c=[1],r=[1],i=[7ff]}]
[mds.rog{ffffffff:84087bb} state up:standby seq 1 join_fscid=7 addr [v2:192.168.2.6:1a90/d2617aac,v1:192.168.2.6:1a91/d2617aac] compat {c=[1],r=[1],i=[7ff]}]
[mds.dl380g7{ffffffff:84087ef} state up:standby seq 1 join_fscid=7 addr [v2:192.168.2.4:1ac8/abd490c,v1:192.168.2.4:1ac9/abd490c] compat {c=[1],r=[1],i=[7ff]}]
[mds.rd240{ffffffff:840883b} state up:standby seq 1 join_fscid=7 addr [v2:192.168.2.20:1a90/2cf0e9ea,v1:192.168.2.20:1a91/2cf0e9ea] compat {c=[1],r=[1],i=[7ff]}]
dumped fsmap epoch 1095528

Actions #2

Updated by Joshua West over 2 years ago

Hmm, not sure turnaround times, but neither the mailing list, a proxmox forum post, nor this ticket has been responded to.

I suspect I may be coming off poorly, and admit I've made some mistakes leading to this point.

Is there a better way to figure this out than the methods I've already tried?

Joshua West wrote:

[...]

Actions #3

Updated by Patrick Donnelly over 2 years ago

  • Assignee set to Patrick Donnelly
Actions #4

Updated by Patrick Donnelly over 2 years ago

  • Is duplicate of Bug #52975: MDSMonitor: no active MDS after cluster deployment added
Actions #5

Updated by Patrick Donnelly over 2 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF