client: fails to reconnect to MDS
... 2020-05-21T00:51:21.903+0000 7faaa0e031c0 10 client.8758 wait_sync_caps want 7 (last is 7, 2 total flushing) 2020-05-21T00:51:21.903+0000 7faaa0e031c0 10 client.8758 waiting on mds.0 tid 6 (want 7) ...
MDS gave up:
2020-05-21T00:52:18.654+0000 7f0c5233d700 7 mds.0.server reconnect timed out, 1 clients have not reconnected in time 2020-05-21T00:52:18.654+0000 7f0c5233d700 1 mds.0.server reconnect gives up on client.8758 192.168.0.1:0/75408247
#3 Updated by Xiubo Li about 1 month ago
If the ceph-fuse client need to flush the caps and does sync wait, the umount() will just return successfully, then the netns container will be destroyed and the network will not be reachable, but the ceph-fuse daemon is still stucked and waiting for the flush caps ack.
This will cause the ceph-fuse daemon get stuck forever and if the mds daemons get restarted, it will try to reconnect the clients, but the stucked ceph-fuse daemnon won't reply to it.