Bug #52094
closedTried out Quincy: All MDS Standby
0%
Description
On Proxmox, and suffering with #51445 (https://tracker.ceph.com/issues/51445)
As any good "Knows enough to be dangerous" sorta folk, I attempted to implement (https://github.com/ceph/ceph/pull/42345) but foolishly learned a lesson about accidentally upgrading oneself to the dev branch.
ceph version 17.0.0-6673-g313be835f7a (313be835f7a5eb5b2e43365d044bf20fd3fd1b2d) quincy (dev)
For the most part, things went smooth, but I beleive I may have identified a bug with regard to none of my four MDS coming out of standby.
In a perfect world, I would get past the bug(?), or figure out how to revert to pacific without the mons being unable to start due to "changes to the on disk structure".
health: HEALTH_ERR
1 filesystem is degraded
1 filesystem has a failed mds daemon
2 large omap objects
1 filesystem is offline
/var/log/ceph/ceph-mon.server.log:flags 32 joinable allow_snaps allow_multimds_snaps allow_standby_replay
/var/log/ceph/ceph-mon.server.log:compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
/var/log/ceph/ceph-mon.server.log:max_mds 1
/var/log/ceph/ceph-mon.server.log:[mds.rd240{-1:136865003} state up:standby seq 1 addr [v2:192.168.2.20:6864/2702233770,v1:192.168.2.20:6865/2702233770] compat {c=[1],r=[1],i=[7ff]}]
/var/log/ceph/ceph-mon.server.log:[mds.server{ffffffff:8337f9f} state up:standby seq 1 addr [v2:192.168.2.2:1a90/7b467d7,v1:192.168.2.2:1a91/7b467d7] compat {c=[1],r=[1],i=[7ff]}]
/var/log/ceph/ceph-mon.server.log:[mds.dl380g7{ffffffff:8338b3b} state up:standby seq 1 addr [v2:192.168.2.4:1a90/514c93ee,v1:192.168.2.4:1a91/514c93ee] compat {c=[1],r=[1],i=[7ff]}]
/var/log/ceph/ceph-mon.server.log:[mds.rog{ffffffff:8338ebf} state up:standby seq 1 addr [v2:192.168.2.6:1a90/5b0fbd32,v1:192.168.2.6:1a91/5b0fbd32] compat {c=[1],r=[1],i=[7ff]}]
mds logs attached for review, but this is the first issue I've opened (at any tracker), so please let me know if I've missed key details or anything which I should've included (My apologies if this is the case.)
Marked as minor due to this being Quincy, where issues are expected during dev.
Files
Updated by Joshua West almost 3 years ago
e1095528 enable_multiple, ever_enabled_multiple: 1,1 default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} legacy client fscid: 7 Filesystem 'cephfs' (7) fs_name cephfs epoch 1095528 flags 32 joinable allow_snaps allow_multimds_snaps allow_standby_replay created 2020-10-27T10:52:01.629171-0600 modified 2021-08-08T09:34:02.509077-0600 tableserver 0 root 0 session_timeout 60 session_autoclose 300 max_file_size 5000000000000 required_client_features {8=mimic} last_failure 0 last_failure_osd_epoch 1093846 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} max_mds 2 in 0,1 up {} failed 0,1 damaged stopped 2 data_pools [21] metadata_pool 22 inline_data disabled balancer standby_count_wanted 1 Standby daemons: [mds.server{-1:138437405} state up:standby seq 1 join_fscid=7 addr [v2:192.168.2.2:6808/2491038724,v1:192.168.2.2:6809/2491038724] compat {c=[1],r=[1],i=[7ff]}] [mds.rog{ffffffff:84087bb} state up:standby seq 1 join_fscid=7 addr [v2:192.168.2.6:1a90/d2617aac,v1:192.168.2.6:1a91/d2617aac] compat {c=[1],r=[1],i=[7ff]}] [mds.dl380g7{ffffffff:84087ef} state up:standby seq 1 join_fscid=7 addr [v2:192.168.2.4:1ac8/abd490c,v1:192.168.2.4:1ac9/abd490c] compat {c=[1],r=[1],i=[7ff]}] [mds.rd240{ffffffff:840883b} state up:standby seq 1 join_fscid=7 addr [v2:192.168.2.20:1a90/2cf0e9ea,v1:192.168.2.20:1a91/2cf0e9ea] compat {c=[1],r=[1],i=[7ff]}] dumped fsmap epoch 1095528
Updated by Joshua West over 2 years ago
Hmm, not sure turnaround times, but neither the mailing list, a proxmox forum post, nor this ticket has been responded to.
I suspect I may be coming off poorly, and admit I've made some mistakes leading to this point.
Is there a better way to figure this out than the methods I've already tried?
Joshua West wrote:
[...]
Updated by Patrick Donnelly over 2 years ago
- Is duplicate of Bug #52975: MDSMonitor: no active MDS after cluster deployment added
Updated by Patrick Donnelly over 2 years ago
- Status changed from New to Duplicate