Bug #45665
closedclient: fails to reconnect to MDS
0%
Description
... 2020-05-21T00:51:21.903+0000 7faaa0e031c0 10 client.8758 wait_sync_caps want 7 (last is 7, 2 total flushing) 2020-05-21T00:51:21.903+0000 7faaa0e031c0 10 client.8758 waiting on mds.0 tid 6 (want 7) ...
From: /ceph/teuthology-archive/pdonnell-2020-05-20_20:43:43-fs-wip-pdonnell-testing-20200520.182104-distro-basic-smithi/5073922/remote/smithi102/log/ceph-client.0.64364.log.gz
MDS gave up:
2020-05-21T00:52:18.654+0000 7f0c5233d700 7 mds.0.server reconnect timed out, 1 clients have not reconnected in time 2020-05-21T00:52:18.654+0000 7f0c5233d700 1 mds.0.server reconnect gives up on client.8758 192.168.0.1:0/75408247
From: /ceph/teuthology-archive/pdonnell-2020-05-20_20:43:43-fs-wip-pdonnell-testing-20200520.182104-distro-basic-smithi/5073922/remote/smithi071/log/ceph-mds.b.log.gz
Updated by Patrick Donnelly almost 4 years ago
- Status changed from New to Triaged
- Assignee set to Xiubo Li
Updated by Xiubo Li almost 4 years ago
When the mds daemon is restarting or trying to reconnect the client, while the client was trying to umount and waiting the sync flush caps.
Updated by Xiubo Li almost 4 years ago
If the ceph-fuse client need to flush the caps and does sync wait, the umount() will just return successfully, then the netns container will be destroyed and the network will not be reachable, but the ceph-fuse daemon is still stucked and waiting for the flush caps ack.
This will cause the ceph-fuse daemon get stuck forever and if the mds daemons get restarted, it will try to reconnect the clients, but the stucked ceph-fuse daemnon won't reply to it.
Updated by Xiubo Li almost 4 years ago
- Status changed from Triaged to Fix Under Review
Updated by Patrick Donnelly almost 4 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #45887: nautilus: client: fails to reconnect to MDS added
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #45888: octopus: client: fails to reconnect to MDS added
Updated by Nathan Cutler almost 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".