Bug #44947: Hung ops for evicted CephFS clients do not get cleaned up fully - CephFS - Ceph

Actions

Copy link

Bug #44947

open

Hung ops for evicted CephFS clients do not get cleaned up fully

Added by David Piper about 4 years ago. Updated almost 4 years ago.

Status:

Need More Info

Priority:

High

Assignee:

Category:

Correctness/Safety

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hello,

After noticing some hung CephFS operations on my client, I rebooted the client. Ceph has evicted and blacklisted this client, and the hung operations have progressed to the "cleaned up request" event, but they are still listed by dump_ops_in_flight are preventing the rebooted client (which has been assigned a new client ID on remounting) from accessing the same inode. New attempts to access this inode result in additional hung operations. The only way I found to clear the hung ops completely and restore access to the inode was to restart my MDS.

I would have expected Ceph to terminate all operations for a client when that client is evicted. Is this behaviour configurable? Are there additional diags I can collect if this reoccurs?

Details of the current setup:
• ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable)
• We're using the ceph kernel driver, kernel: 5.5.7-1.el7.elrepo.x86_64
• The client server has 38 separate directories mounted, all from the same CephFS filesystem.
• All 38 directories are mounted with the same config by three separate clients.
• Mount config (in fstab): 10.225.44.236,10.225.44.237,10.225.44.238:6789:/albacore/system/deploy on /opt/dcl/deploy type ceph (rw,noatime,name=albacore,secret=<hidden>,acl,wsize=32768,rsize=32768,_netdev)

Timeline:

1) 2020-03-28 21:38:58 - a cephFS op from client:366380 on inode .tmp_depl_license_status.svr01 gets stuck at "failed to wrlock, waiting" (see dump_ops_in_flight). Other ops for the same inode over the course of the next few days get stuck in a "dispatched" state (again see dump_ops_in_flight). Ceph health reports multiple slow ops.

2) 2020-03-30 11:11:44.582 - the client server is rebooted (with a "reboot" command from the shell). Ceph MDS logs show us evicting client session 366380. The client no longer appears in the output of `ceph tell mds.0 client ls`

3) 2020-03-30 11:11:44.664068 onwards - all the existing ops_in_flight for this client progress through events "failed to wrlock, waiting", "killing request", "cleaned up request" but the ops are still in the ops_in_flight list and still count towards ceph's slow ops count. The client no longer records these ops under /sys/kernel/debug/ceph/*/mdsc

4) 2020-03-30 11:18:37 - the client server comes back online, remounts the directory from CephFS, getting a new client session ID: 877605

5) 2020-03-30 11:21:06 - client:877605 tries to access the inode in question (.tmp_depl_license_status.svr01) and gets stuck in "failed to wrlock, waiting". More get caught behind it in "dispatched" state again, as before. These ops appear under /sys/kernel/debug/ceph/*/mdsc

Kind regards,

Dave

Files

Download all files

dump_ops_in_flight.txt (51.3 KB) dump_ops_in_flight.txt		David Piper, 04/06/2020 08:27 AM
mds_2.txt (123 KB) mds_2.txt		David Piper, 04/06/2020 08:31 AM

Actions

Copy link

Updated by Greg Farnum about 4 years ago

Project changed from Ceph to CephFS
Category set to Correctness/Safety
Priority changed from Normal to High
Component(FS) MDS added

This is quite odd — the only way for a request to get marked as cleaned up like that is after it does what should be all of the cleanup, which involves dropping the locks.

The request can be kept around if something keeps a reference to it, which I would guess is why it's still showing up, but I'm not sure how that could be blocking ongoing IO or holding on to locks...

Actions

Copy link