Bug #55411
closedBUG: Dentry XXXXXX still in use (1) [unmount of ceph ceph]
0%
Description
I've been able to fairly reliably reproduce a busy dentry at unmount problem with ceph by bouncing the MDS's regularly while running fsstress against it. I'm testing against a 3 node cephadm cluster. The fs has 3 MDSs and no standbys. On all 3 nodes, I run a script like this to bounce the MDSs each in turn:
#!/bin/bash CEPHADM="/path/to/cephadm" # stagger them randomly sleep $(($RANDOM % 30)) while true; do $CEPHADM ls --no-detail | jq -r '.[]|.systemd_unit' | egrep '(@mds)' | while read unit; do systemctl stop $unit sleep 1 systemctl reset-failed $unit systemctl start $unit done sleep 60 done
On the client, I then run xfstest generic/013 (which is an fsstress test). Eventually, once the test completes and it unmounts ceph, we get some messages like this (along with some stack traces that aren't particularly helpful):
[ 2083.434109] BUG: Dentry 0000000005edbd24{i=2000007d09a,n=d94XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX} still in use (1) [unmount of ceph ceph] [ 2083.693821] BUG: Dentry 000000003f412ac2{i=3000004ff7f,n=l26} still in use (1) [unmount of ceph ceph] [ 2083.991207] BUG: Dentry 00000000bdbd2251{i=2000007cd23,n=f2XX} still in use (1) [unmount of ceph ceph] [ 2084.290827] BUG: Dentry 00000000e5f35b7f{i=2000007cfbe,n=c68X} still in use (1) [unmount of ceph ceph]
I suspect that some of the kernel error handling or retransmission codepaths are resulting in a dentry leak. This was seen with the current "testing" branch kernel (which doesn't contain any of the fscrypt changes).