Project

General

Profile

Actions

Bug #45665

closed

client: fails to reconnect to MDS

Added by Patrick Donnelly almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

...
2020-05-21T00:51:21.903+0000 7faaa0e031c0 10 client.8758 wait_sync_caps want 7 (last is 7, 2 total flushing)
2020-05-21T00:51:21.903+0000 7faaa0e031c0 10 client.8758  waiting on mds.0 tid 6 (want 7)
...

From: /ceph/teuthology-archive/pdonnell-2020-05-20_20:43:43-fs-wip-pdonnell-testing-20200520.182104-distro-basic-smithi/5073922/remote/smithi102/log/ceph-client.0.64364.log.gz

MDS gave up:

2020-05-21T00:52:18.654+0000 7f0c5233d700  7 mds.0.server reconnect timed out, 1 clients have not reconnected in time
2020-05-21T00:52:18.654+0000 7f0c5233d700  1 mds.0.server reconnect gives up on client.8758 192.168.0.1:0/75408247

From: /ceph/teuthology-archive/pdonnell-2020-05-20_20:43:43-fs-wip-pdonnell-testing-20200520.182104-distro-basic-smithi/5073922/remote/smithi071/log/ceph-mds.b.log.gz


Related issues 2 (0 open2 closed)

Copied to CephFS - Backport #45887: nautilus: client: fails to reconnect to MDSResolvedNathan CutlerActions
Copied to CephFS - Backport #45888: octopus: client: fails to reconnect to MDSResolvedWei-Chung ChengActions
Actions #1

Updated by Patrick Donnelly almost 4 years ago

  • Status changed from New to Triaged
  • Assignee set to Xiubo Li
Actions #2

Updated by Xiubo Li almost 4 years ago

When the mds daemon is restarting or trying to reconnect the client, while the client was trying to umount and waiting the sync flush caps.

Actions #3

Updated by Xiubo Li almost 4 years ago

If the ceph-fuse client need to flush the caps and does sync wait, the umount() will just return successfully, then the netns container will be destroyed and the network will not be reachable, but the ceph-fuse daemon is still stucked and waiting for the flush caps ack.

This will cause the ceph-fuse daemon get stuck forever and if the mds daemons get restarted, it will try to reconnect the clients, but the stucked ceph-fuse daemnon won't reply to it.

Actions #4

Updated by Xiubo Li almost 4 years ago

  • Pull request ID set to 35328
Actions #5

Updated by Xiubo Li almost 4 years ago

  • Status changed from Triaged to Fix Under Review
Actions #6

Updated by Patrick Donnelly almost 4 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #45887: nautilus: client: fails to reconnect to MDS added
Actions #8

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #45888: octopus: client: fails to reconnect to MDS added
Actions #9

Updated by Nathan Cutler almost 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF