Bug #47881
closedmon/MDSMonitor: stop all MDS processes in the cluster at the same time. Some MDS cannot enter the "failed" state
0%
Description
Stop all MDS processes in the cluster at the same time, After all MDS processes exits, some MDS are still in the "active(laggy)" or "resolve(laggy)"state through the "ceph fs status" command.
Logs as follow:
2020-10-16 16:14:27.629 7f1f7ac52700 5 mon.host-192-168-9-2@0(leader).mds e962 preprocess_beacon mdsbeacon(48335776/host-192-168-9-4-9 down:dne seq 10044 v961) v7 from mds.? [v2:100.100.8.4:6842/2091715094,v1:100.100.8.4:6843/2091715094] compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
2020-10-16 16:14:27.629 7f1f7ac52700 10 mon.host-192-168-9-2@0(leader).mds e962 preprocess_beacon: GID exists in map: 48335776
2020-10-16 16:14:27.629 7f1f7ac52700 10 mon.host-192-168-9-2@0(leader).mds e962 mds_beacon mdsbeacon(48335776/host-192-168-9-4-9 down:dne seq 10044 v961) v7 ignoring requested state, because mds hasn't seen latest map
2020-10-16 16:14:27.629 7f1f7ac52700 5 mon.host-192-168-9-2@0(leader).mds e962 _note_beacon mdsbeacon(48335776/host-192-168-9-4-9 down:dne seq 10044 v961) v7 noting time
2020-10-16 16:14:27.629 7f1f7ac52700 2 mon.host-192-168-9-2@0(leader) e1 send_reply 0x55c91e02e410 0x55c91e524000 mdsbeacon(48335776/host-192-168-9-4-9 down:dne seq 10044 v962) v7
2020-10-16 16:14:27.629 7f1f7ac52700 15 mon.host-192-168-9-2@0(leader) e1 send_reply routing reply to v2:100.100.8.4:6842/2091715094 via v2:100.100.8.3:3300/0 for request mdsbeacon(48335776/host-192-168-9-4-9 down:dne seq 10044 v961) v7