Bug #23289
openmds: xfstest generic/089 hangs in rename syscall in luminous
0%
Description
Running this test on a multimds Luminous cluster results in a stalled (very slow?) client. I'm using a kernel client, but I was able to reproduce the issue with fuse as well. Thus, I don't think this is a client issue. Also, I wasn't able to reproduce it with a mimic (master branch) cluster.
In the client, here's the test stack:
cat /proc/1065/stack
[<0>] ceph_mdsc_do_request+0xef/0x2e0
[<0>] ceph_rename+0x125/0x1e0
[<0>] vfs_rename+0x335/0x810
[<0>] SyS_renameat2+0x47a/0x530
[<0>] do_syscall_64+0x60/0x110
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff
I'm able to reproduce this very easily with a vstart cluster:
MON=1 OSD=2 MDS=3 ../src/vstart.sh -x -n -i 192.168.155.1 --multimds 2 -b
If, while the test is stalled, I reduce the number of MDSs, the test will proceed and finish. I'm still looking at the MDS logs (attached), but I'm not very proficient reading those so it may take a while for me to see something that may be obvious to more trained eyes.
Files