Bug #63471
openclient: error code inconsistency when accessing a mount of a deleted dir
0%
Description
Accessing a FUSE mount of a volume that has been deleted in the meantime results in ENOENT, and a faulty info when listing a directory:
root# ls -l
ls: cannot access 'mnt': No such file or directory
total 0
...
d?????????? ? ? ? ? ? mnt
This is inconsistent with kernel client's error code EACCES.
ENOENT breaks automation tools (namely ceph-csi) as it leads it to thinking that the mountpoint is already gone, while it's just that a user mistakenly deleted the cephfs directory from an external tool before stopping the workload accessing the mount. Indeed the automation tools could be fixed to accommodate this case, however it would be nice if the FUSE and kernel clients were returning error codes consistently.
Thanks!
Steps to reproduce:
1. Create a subvol
2. Mount it
3. Delete the subvol
4. Try to access the mount, observe the errors returned by ceph-fuse and the kernel client
Attaching logs:
ceph-fuse
[root@rvasek-1-27-6-2-qqbsjsnaopix-node-0 tmp]# ceph-fuse -d -f -k pvc-2385dfc3-fb5d-4c7a-ae[root@rvasek-1-27-6-2-qqbsjsnaopix-master-0 tmp]# ceph-fuse -d -f -k pvc-2385dfc3-fb5d-4c7a-aebe-d7e9100f8f1a.cephx.keyring --id pvc-2385dfc3-fb5d-4c7a-aebe-d7e9100f8f1a -m 188.185.66.208:6790,188.184.94.56:6790,188.184.86.25:6790 -r /volumes/_nogroup/e78a0069-f781-46f5-b674-cce4ed0c57ef/ffe5424c-8fdb-44b6-8d19-9524c1b6f7be /mnt
2023-11-07T16:43:58.145+0000 7f21b5450700 -1 --2- 188.185.120.133:0/3239993688 >> v2:188.184.94.56:6790/0 conn(0x55ce41c93a00 0x55ce41c988b0 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner peer v2:188.184.94.56:6790/0 is using msgr V1 protocol
2023-11-07T16:43:58.145+0000 7f21b5c51700 -1 --2- 188.185.120.133:0/3239993688 >> v2:188.184.86.25:6790/0 conn(0x55ce41c930d0 0x55ce41c934c0 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner peer v2:188.184.86.25:6790/0 is using msgr V1 protocol
2023-11-07T16:43:58.154+0000 7f21be585580 0 ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable), process ceph-fuse, pid 4167065
2023-11-07T16:43:58.156+0000 7f21be585580 -1 init, newargv = 0x55ce41ca0950 newargc=16
ceph-fuse[4167065]: starting ceph client
FUSE library version: 2.9.7
ceph-fuse[4167065]: starting fuse
unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
INIT: 7.38
flags=0x73fffffb
max_readahead=0x00020000
INIT: 7.19
flags=0x0000043b
max_readahead=0x00020000
max_write=0x00020000
max_background=0
congestion_threshold=0
unique: 2, success, outsize: 40
unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 4167095
unique: 4, success, outsize: 120
unique: 6, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 4167095
unique: 6, success, outsize: 120
unique: 8, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 4167306
unique: 8, success, outsize: 120
unique: 10, opcode: GETXATTR (22), nodeid: 1, insize: 72, pid: 4167306
unique: 10, error: -95 (Operation not supported), outsize: 16
unique: 12, opcode: GETXATTR (22), nodeid: 1, insize: 64, pid: 4167306
unique: 12, error: -95 (Operation not supported), outsize: 16
unique: 14, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 4167411
unique: 14, success, outsize: 120
unique: 16, opcode: GETXATTR (22), nodeid: 1, insize: 72, pid: 4167411
unique: 16, error: -95 (Operation not supported), outsize: 16
unique: 18, opcode: GETXATTR (22), nodeid: 1, insize: 64, pid: 4167411
unique: 18, error: -95 (Operation not supported), outsize: 16
unique: 20, opcode: OPENDIR (27), nodeid: 1, insize: 48, pid: 4167411
unique: 20, success, outsize: 32
unique: 22, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 4167411
unique: 22, success, outsize: 120
unique: 24, opcode: READDIR (28), nodeid: 1, insize: 80, pid: 4167411
unique: 24, success, outsize: 80
unique: 26, opcode: READDIR (28), nodeid: 1, insize: 80, pid: 4167411
unique: 26, success, outsize: 16
unique: 28, opcode: RELEASEDIR (29), nodeid: 1, insize: 64, pid: 0
unique: 28, success, outsize: 16
<<< DELETING THE SUBVOL NOW >>>
2023-11-07T16:44:52.089+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
unique: 30, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 4167759
2023-11-07T16:45:05.112+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.114+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.116+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.118+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.120+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.122+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.125+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.126+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.128+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.130+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.132+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.135+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.137+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.139+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.141+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.143+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.145+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.147+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.150+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.151+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.154+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.156+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.158+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.159+0000 7f219d7fa700 0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
unique: 30, error: -2 (No such file or directory), outsize: 16
<<< umount >>>
unique: 30, error: -2 (No such file or directory), outsize: 16
unique: 32, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 4168557
unique: 32, error: -2 (No such file or directory), outsize: 16
ceph-fuse[4167065]: fuse finished with error 0 and tester_r 0
Kernel logs:
[root@rvasek-1-27-6-2-qqbsjsnaopix-node-0 core]# dmesg | grep -i ceph
[432918.048321] libceph: mon1 (1)188.184.94.56:6790 session established
[432918.050011] libceph: client1695439645 fsid dd535a7e-4647-4bee-853d-f34112615f81
[433038.193633] libceph: mds0 (1)188.184.83.152:6801 socket closed (con state OPEN)
[433039.056883] libceph: mds0 (1)188.184.83.152:6801 session reset
[433039.056890] ceph: mds0 closed our session
[433039.056891] ceph: mds0 reconnect start
[433039.058421] ceph: mds0 reconnect denied
[433040.732088] libceph: mds0 (1)188.184.83.152:6801 socket closed (con state V1_CONNECT_MSG)
Host:
# uname -a
Linux rvasek-1-27-6-2-qqbsjsnaopix-node-0 6.4.15-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Sep 7 00:25:01 UTC 2023 x86_64 GNU/Linux
Cheers,
Robert Vasek
Updated by Venky Shankar 6 months ago
- Category set to Correctness/Safety
- Assignee set to Kotresh Hiremath Ravishankar
- Target version set to v19.0.0
- Backport set to quincy,reef
Updated by Venky Shankar 6 months ago
- Priority changed from Normal to High
- Severity changed from 3 - minor to 2 - major
Updated by Kotresh Hiremath Ravishankar 5 months ago
On the latest main, I am seeing a RADOS time out error. I will dig further into this.
kotresh:build$ sudo bin/ceph-fuse -c ./ceph.conf -r /volumes/_nogroup/sub_0/ded5fa4a-e949-4e40-9be6-d20ddee80bff /mnt 2023-11-30T18:51:38.974+0530 7eff6da634c0 -1 WARNING: all dangerous and experimental features are enabled. 2023-11-30T18:51:38.979+0530 7eff6da634c0 -1 WARNING: all dangerous and experimental features are enabled. 2023-11-30T18:51:38.982+0530 7eff6da634c0 -1 WARNING: all dangerous and experimental features are enabled. 2023-11-30T18:51:38.983+0530 7eff6da634c0 -1 init, newargv = 0x55632e0df320 newargc=15 ceph-fuse[38852]: starting ceph client ceph-fuse[38852]: starting fuse kotresh:ceph$ bin/ceph fs subvolume rm a sub_0 bash: bin/ceph: No such file or directory kotresh:ceph$ upceph kotresh:ceph$ cd build kotresh:build$ bin/ceph fs subvolume rm a sub_0 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 2023-11-30T18:53:09.182+0530 7fa1edc586c0 -1 WARNING: all dangerous and experimental features are enabled. 2023-11-30T18:58:09.187+0530 7fa1edc586c0 0 monclient(hunting): authenticate timed out after 300 [errno 110] RADOS timed out (error connecting to the cluster)