What I know till now,
mds.b is the active MDS with rank 0:
260477:2023-12-07T00:19:18.208+0000 7fa052b6a700 7 mon.a@0(leader).log v300 update_from_paxos applying incremental log 300 2023-12-07T00:19:17.276369+0000 mon.a (mon.0) 995 : cluster [INF] daemon mds.b is now active in filesystem cephfs as rank 0
this is the fsmap(sharing only the MDSs data):
max_mds 3
in 0,1,2
up {0=24512,1=24479,2=24451}
failed
damaged
stopped
data_pools [3]
metadata_pool 2
inline_data disabled
balancer
bal_rank_mask -1
standby_count_wanted 0
[mds.b{0:24512} state up:active seq 3 addr [v2:172.21.15.136:6838/2030257219,v1:172.21.15.136:6839/2030257219] compat {c=[1],r=[1],i=[fff]}]
[mds.i{0:24469} state up:standby-replay seq 1 addr [v2:172.21.15.181:6838/824276348,v1:172.21.15.181:6839/824276348] compat {c=[1],r=[1],i=[fff]}]
[mds.h{1:24479} state up:active seq 6 addr [v2:172.21.15.136:6836/3363137736,v1:172.21.15.136:6837/3363137736] compat {c=[1],r=[1],i=[fff]}]
[mds.e{1:24455} state up:standby-replay seq 1 addr [v2:172.21.15.136:6834/3499967199,v1:172.21.15.136:6835/3499967199] compat {c=[1],r=[1],i=[fff]}]
[mds.c{2:24451} state up:active seq 7 addr [v2:172.21.15.181:6836/3909565630,v1:172.21.15.181:6837/3909565630] compat {c=[1],r=[1],i=[fff]}]
Standby daemons:
[mds.j{-1:14574} state up:standby seq 1 addr [v2:172.21.15.105:6834/462434151,v1:172.21.15.105:6835/462434151] compat {c=[1],r=[1],i=[fff]}]
[mds.d{-1:14586} state up:standby seq 1 addr [v2:172.21.15.105:6836/2965410210,v1:172.21.15.105:6837/2965410210] compat {c=[1],r=[1],i=[fff]}]
[mds.g{-1:14649} state up:standby seq 1 addr [v2:172.21.15.105:6838/913925081,v1:172.21.15.105:6839/913925081] compat {c=[1],r=[1],i=[fff]}]
[mds.a{-1:14661} state up:standby seq 1 addr [v2:172.21.15.105:6840/2502623746,v1:172.21.15.105:6841/2502623746] compat {c=[1],r=[1],i=[fff]}]
[mds.l{-1:24391} state up:standby seq 1 addr [v2:172.21.15.181:6832/482210475,v1:172.21.15.181:6833/482210475] compat {c=[1],r=[1],i=[fff]}]
[mds.k{-1:24431} state up:standby seq 1 addr [v2:172.21.15.136:6832/572280608,v1:172.21.15.136:6833/572280608] compat {c=[1],r=[1],i=[fff]}]
[mds.f{-1:24439} state up:standby seq 1 addr [v2:172.21.15.181:6834/3575479535,v1:172.21.15.181:6835/3575479535] compat {c=[1],r=[1],i=[fff]}]
then mds.b's state gets null:
2023-12-07T00:26:45.444+0000 7f76427c6700 10 mds.b my gid is 24512
2023-12-07T00:26:45.444+0000 7f76427c6700 10 mds.b map says I am mds.-1.-1 state null
2023-12-07T00:26:45.444+0000 7f76427c6700 10 mds.b msgr says I am [v2:172.21.15.136:6838/2030257219,v1:172.21.15.136:6839/2030257219]
2023-12-07T00:26:45.444+0000 7f76427c6700 1 mds.b Map removed me [mds.b{0:24512} state up:active seq 3 export targets 1,2 addr [v2:172.21.15.136:6838/2030257219,v1:172.21.15.136:6839/2030257219] compat {c=[1],r=[1],i=[fff]}] from cluster; respawning! See cluster/monitor logs for details.
2023-12-07T00:26:45.444+0000 7f76427c6700 1 mds.b respawn!
immediately mon report MDS as damaged:
2023-12-07T00:26:45.722 INFO:journalctl@ceph.mon.a.smithi105.stdout:Dec 07 00:26:45 smithi105 ceph-274c4118-9495-11ee-95a2-87774f69a715-mon-a[98911]: 2023-12-07T00:26:45.416+0000 7fa05536f700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 mds daemon damaged (MDS_DAMAGE)
2023-12-07T00:26:45.722 INFO:journalctl@ceph.mon.a.smithi105.stdout:Dec 07 00:26:45 smithi105 ceph-mon[98935]: pgmap v484: 97 pgs: 97 active+clean; 13 GiB data, 134 GiB used, 939 GiB / 1.0 TiB avail; 19 KiB/s rd, 113 MiB/s wr, 19.36k op/s
2023-12-07T00:26:45.722 INFO:journalctl@ceph.mon.a.smithi105.stdout:Dec 07 00:26:45 smithi105 ceph-mon[98935]: Error loading MDS rank 0: (22) Invalid argument
2023-12-07T00:26:45.722 INFO:journalctl@ceph.mon.a.smithi105.stdout:Dec 07 00:26:45 smithi105 ceph-mon[98935]: Health check failed: 1 filesystem is degraded (FS_DEGRADED)
2023-12-07T00:26:45.722 INFO:journalctl@ceph.mon.a.smithi105.stdout:Dec 07 00:26:45 smithi105 ceph-mon[98935]: Health check failed: 1 mds daemon damaged (MDS_DAMAGE)
2023-12-07T00:26:45.722 INFO:journalctl@ceph.mon.a.smithi105.stdout:Dec 07 00:26:45 smithi105 ceph-mon[98935]: osdmap e86: 12 total, 12 up, 12 in
2023-12-07T00:26:45.722 INFO:journalctl@ceph.mon.a.smithi105.stdout:Dec 07 00:26:45 smithi105 ceph-mon[98935]: mds.? [v2:172.21.15.181:6838/824276348,v1:172.21.15.181:6839/824276348] down:damaged
2023-12-07T00:26:45.723 INFO:journalctl@ceph.mon.a.smithi105.stdout:Dec 07 00:26:45 smithi105 ceph-mon[98935]: fsmap cephfs:2/3 {1=h=up:active,2=c=up:active} 2 up:standby-replay 6 up:standby, 1 damaged
2023-12-07T00:26:45.729 INFO:journalctl@ceph.mds.i.smithi181.stdout:Dec 07 00:26:45 smithi181 ceph-274c4118-9495-11ee-95a2-87774f69a715-mds-i[141832]: 2023-12-07T00:26:45.410+0000 7f40f5a1f700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 0: (22) Invalid argument
2023-12-07T00:26:45.729 INFO:journalctl@ceph.mds.i.smithi181.stdout:Dec 07 00:26:45 smithi181 ceph-274c4118-9495-11ee-95a2-87774f69a715-mds-i[141832]: -14> 2023-12-07T00:26:45.410+0000 7f40f5a1f700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 0: (22) Invalid argument
mds.i(which is the standby-replay daemon) is marked as damaged:
2023-12-07T00:26:45.410+0000 7f40f5a1f700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 0: (22) Invalid argument
2023-12-07T00:26:45.410+0000 7f40f5a1f700 5 mds.beacon.i set_want_state: up:standby-replay -> down:damaged
and the MDSs status in fsmap is like this:
up {1=24479,2=24451}
failed
damaged 0
stopped
data_pools [3]
metadata_pool 2
inline_data disabled
balancer
bal_rank_mask -1
standby_count_wanted 0
[mds.h{1:24479} state up:active seq 6 addr [v2:172.21.15.136:6836/3363137736,v1:172.21.15.136:6837/3363137736] compat {c=[1],r=[1],i=[fff]}]
[mds.e{1:24455} state up:standby-replay seq 1 addr [v2:172.21.15.136:6834/3499967199,v1:172.21.15.136:6835/3499967199] compat {c=[1],r=[1],i=[fff]}]
[mds.c{2:24451} state up:active seq 7 export targets 0,1 addr [v2:172.21.15.181:6836/3909565630,v1:172.21.15.181:6837/3909565630] compat {c=[1],r=[1],i=[fff]}]
[mds.f{2:24439} state up:standby-replay seq 1 addr [v2:172.21.15.181:6834/3575479535,v1:172.21.15.181:6835/3575479535] compat {c=[1],r=[1],i=[fff]}]
Standby daemons:
[mds.j{-1:14574} state up:standby seq 1 addr [v2:172.21.15.105:6834/462434151,v1:172.21.15.105:6835/462434151] compat {c=[1],r=[1],i=[fff]}]
[mds.d{-1:14586} state up:standby seq 1 addr [v2:172.21.15.105:6836/2965410210,v1:172.21.15.105:6837/2965410210] compat {c=[1],r=[1],i=[fff]}]
[mds.g{-1:14649} state up:standby seq 1 addr [v2:172.21.15.105:6838/913925081,v1:172.21.15.105:6839/913925081] compat {c=[1],r=[1],i=[fff]}]
[mds.a{-1:14661} state up:standby seq 1 addr [v2:172.21.15.105:6840/2502623746,v1:172.21.15.105:6841/2502623746] compat {c=[1],r=[1],i=[fff]}]
[mds.l{-1:24391} state up:standby seq 1 addr [v2:172.21.15.181:6832/482210475,v1:172.21.15.181:6833/482210475] compat {c=[1],r=[1],i=[fff]}]
[mds.k{-1:24431} state up:standby seq 1 addr [v2:172.21.15.136:6832/572280608,v1:172.21.15.136:6833/572280608] compat {c=[1],r=[1],i=[fff]}]
and this stays like this i.e. there are active MDSs but none at rank 0 i.e. no rank failover takes place which is strange and that's why since the ceph tell cmd is targeted for rank 0, it always fails.
The reason why the active mds at rank 0 goes silent is not yet clear from the mds logs.
However there is this trace in mds.i which is the standby-replay daemon:
2023-12-07T00:26:44.960+0000 7f40f7222700 0 mds.24469.journaler.mdlog(ro) _finish_read got less than expected (4149413)
and the logs are filled with this line:
2023-12-07T00:26:25.387+0000 7f4100234700 1 -- [v2:172.21.15.181:6838/824276348,v1:172.21.15.181:6839/824276348] <== osd.8 v2:172.21.15.181:6816/602100546 173 ==== osd_op_reply(1141 200.00000003 [stat] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 156+0+0 (crc 0 0 0) 0x555d6c9aafc0 con 0x555d6d7d0400
which might indicate that something is wrong with the on-disk structures and that's why we see the read is less than expected and we see "got less than expected" in the logs. Can this lead to MDS crashing?