Project

General

Profile

Actions

Bug #55411

closed

BUG: Dentry XXXXXX still in use (1) [unmount of ceph ceph]

Added by Jeff Layton about 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I've been able to fairly reliably reproduce a busy dentry at unmount problem with ceph by bouncing the MDS's regularly while running fsstress against it. I'm testing against a 3 node cephadm cluster. The fs has 3 MDSs and no standbys. On all 3 nodes, I run a script like this to bounce the MDSs each in turn:

#!/bin/bash

CEPHADM="/path/to/cephadm" 

# stagger them randomly
sleep $(($RANDOM % 30))

while true; do
    $CEPHADM ls --no-detail | jq -r '.[]|.systemd_unit' | egrep '(@mds)' |
    while read unit; do
        systemctl stop $unit
        sleep 1
        systemctl reset-failed $unit
        systemctl start $unit
    done
    sleep 60
done

On the client, I then run xfstest generic/013 (which is an fsstress test). Eventually, once the test completes and it unmounts ceph, we get some messages like this (along with some stack traces that aren't particularly helpful):

[ 2083.434109] BUG: Dentry 0000000005edbd24{i=2000007d09a,n=d94XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX}  still in use (1) [unmount of ceph ceph]
[ 2083.693821] BUG: Dentry 000000003f412ac2{i=3000004ff7f,n=l26}  still in use (1) [unmount of ceph ceph]
[ 2083.991207] BUG: Dentry 00000000bdbd2251{i=2000007cd23,n=f2XX}  still in use (1) [unmount of ceph ceph]
[ 2084.290827] BUG: Dentry 00000000e5f35b7f{i=2000007cfbe,n=c68X}  still in use (1) [unmount of ceph ceph]

I suspect that some of the kernel error handling or retransmission codepaths are resulting in a dentry leak. This was seen with the current "testing" branch kernel (which doesn't contain any of the fscrypt changes).


Related issues 1 (0 open1 closed)

Related to Linux kernel client - Bug #55284: kclient: filesystem sync will stuck for around 5 seconds sometimesResolvedXiubo Li

Actions
Actions

Also available in: Atom PDF