Project

General

Profile

Bug #45665

client: fails to reconnect to MDS

Added by Patrick Donnelly about 2 months ago. Updated about 1 month ago.

Status:
Pending Backport
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature:

Description

...
2020-05-21T00:51:21.903+0000 7faaa0e031c0 10 client.8758 wait_sync_caps want 7 (last is 7, 2 total flushing)
2020-05-21T00:51:21.903+0000 7faaa0e031c0 10 client.8758  waiting on mds.0 tid 6 (want 7)
...

From: /ceph/teuthology-archive/pdonnell-2020-05-20_20:43:43-fs-wip-pdonnell-testing-20200520.182104-distro-basic-smithi/5073922/remote/smithi102/log/ceph-client.0.64364.log.gz

MDS gave up:

2020-05-21T00:52:18.654+0000 7f0c5233d700  7 mds.0.server reconnect timed out, 1 clients have not reconnected in time
2020-05-21T00:52:18.654+0000 7f0c5233d700  1 mds.0.server reconnect gives up on client.8758 192.168.0.1:0/75408247

From: /ceph/teuthology-archive/pdonnell-2020-05-20_20:43:43-fs-wip-pdonnell-testing-20200520.182104-distro-basic-smithi/5073922/remote/smithi071/log/ceph-mds.b.log.gz


Related issues

Copied to fs - Backport #45887: nautilus: client: fails to reconnect to MDS Resolved
Copied to fs - Backport #45888: octopus: client: fails to reconnect to MDS Resolved

History

#1 Updated by Patrick Donnelly about 2 months ago

  • Status changed from New to Triaged
  • Assignee set to Xiubo Li

#2 Updated by Xiubo Li about 2 months ago

When the mds daemon is restarting or trying to reconnect the client, while the client was trying to umount and waiting the sync flush caps.

#3 Updated by Xiubo Li about 1 month ago

If the ceph-fuse client need to flush the caps and does sync wait, the umount() will just return successfully, then the netns container will be destroyed and the network will not be reachable, but the ceph-fuse daemon is still stucked and waiting for the flush caps ack.

This will cause the ceph-fuse daemon get stuck forever and if the mds daemons get restarted, it will try to reconnect the clients, but the stucked ceph-fuse daemnon won't reply to it.

#4 Updated by Xiubo Li about 1 month ago

  • Pull request ID set to 35328

#5 Updated by Xiubo Li about 1 month ago

  • Status changed from Triaged to Fix Under Review

#6 Updated by Patrick Donnelly about 1 month ago

  • Status changed from Fix Under Review to Pending Backport

#7 Updated by Nathan Cutler about 1 month ago

  • Copied to Backport #45887: nautilus: client: fails to reconnect to MDS added

#8 Updated by Nathan Cutler about 1 month ago

  • Copied to Backport #45888: octopus: client: fails to reconnect to MDS added

Also available in: Atom PDF