Project

General

Profile

Actions

Bug #63471

open

client: error code inconsistency when accessing a mount of a deleted dir

Added by Robert Vasek 6 months ago. Updated 5 months ago.

Status:
New
Priority:
High
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
Backport:
quincy,reef
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
ceph-fuse
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Accessing a FUSE mount of a volume that has been deleted in the meantime results in ENOENT, and a faulty info when listing a directory:

root# ls -l
ls: cannot access 'mnt': No such file or directory
total 0
...
d??????????   ? ?    ?       ?            ? mnt

This is inconsistent with kernel client's error code EACCES.

ENOENT breaks automation tools (namely ceph-csi) as it leads it to thinking that the mountpoint is already gone, while it's just that a user mistakenly deleted the cephfs directory from an external tool before stopping the workload accessing the mount. Indeed the automation tools could be fixed to accommodate this case, however it would be nice if the FUSE and kernel clients were returning error codes consistently.

Thanks!

Steps to reproduce:
1. Create a subvol
2. Mount it
3. Delete the subvol
4. Try to access the mount, observe the errors returned by ceph-fuse and the kernel client

Attaching logs:

ceph-fuse

[root@rvasek-1-27-6-2-qqbsjsnaopix-node-0 tmp]# ceph-fuse -d -f -k pvc-2385dfc3-fb5d-4c7a-ae[root@rvasek-1-27-6-2-qqbsjsnaopix-master-0 tmp]# ceph-fuse -d -f -k pvc-2385dfc3-fb5d-4c7a-aebe-d7e9100f8f1a.cephx.keyring --id pvc-2385dfc3-fb5d-4c7a-aebe-d7e9100f8f1a -m 188.185.66.208:6790,188.184.94.56:6790,188.184.86.25:6790 -r /volumes/_nogroup/e78a0069-f781-46f5-b674-cce4ed0c57ef/ffe5424c-8fdb-44b6-8d19-9524c1b6f7be /mnt
2023-11-07T16:43:58.145+0000 7f21b5450700 -1 --2- 188.185.120.133:0/3239993688 >> v2:188.184.94.56:6790/0 conn(0x55ce41c93a00 0x55ce41c988b0 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner peer v2:188.184.94.56:6790/0 is using msgr V1 protocol
2023-11-07T16:43:58.145+0000 7f21b5c51700 -1 --2- 188.185.120.133:0/3239993688 >> v2:188.184.86.25:6790/0 conn(0x55ce41c930d0 0x55ce41c934c0 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner peer v2:188.184.86.25:6790/0 is using msgr V1 protocol
2023-11-07T16:43:58.154+0000 7f21be585580  0 ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable), process ceph-fuse, pid 4167065
2023-11-07T16:43:58.156+0000 7f21be585580 -1 init, newargv = 0x55ce41ca0950 newargc=16
ceph-fuse[4167065]: starting ceph client
FUSE library version: 2.9.7
ceph-fuse[4167065]: starting fuse
unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
INIT: 7.38
flags=0x73fffffb
max_readahead=0x00020000
   INIT: 7.19
   flags=0x0000043b
   max_readahead=0x00020000
   max_write=0x00020000
   max_background=0
   congestion_threshold=0
   unique: 2, success, outsize: 40
unique: 4, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 4167095
   unique: 4, success, outsize: 120
unique: 6, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 4167095
   unique: 6, success, outsize: 120
unique: 8, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 4167306
   unique: 8, success, outsize: 120
unique: 10, opcode: GETXATTR (22), nodeid: 1, insize: 72, pid: 4167306
   unique: 10, error: -95 (Operation not supported), outsize: 16
unique: 12, opcode: GETXATTR (22), nodeid: 1, insize: 64, pid: 4167306
   unique: 12, error: -95 (Operation not supported), outsize: 16
unique: 14, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 4167411
   unique: 14, success, outsize: 120
unique: 16, opcode: GETXATTR (22), nodeid: 1, insize: 72, pid: 4167411
   unique: 16, error: -95 (Operation not supported), outsize: 16
unique: 18, opcode: GETXATTR (22), nodeid: 1, insize: 64, pid: 4167411
   unique: 18, error: -95 (Operation not supported), outsize: 16
unique: 20, opcode: OPENDIR (27), nodeid: 1, insize: 48, pid: 4167411
   unique: 20, success, outsize: 32
unique: 22, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 4167411
   unique: 22, success, outsize: 120
unique: 24, opcode: READDIR (28), nodeid: 1, insize: 80, pid: 4167411
   unique: 24, success, outsize: 80
unique: 26, opcode: READDIR (28), nodeid: 1, insize: 80, pid: 4167411
   unique: 26, success, outsize: 16
unique: 28, opcode: RELEASEDIR (29), nodeid: 1, insize: 64, pid: 0
   unique: 28, success, outsize: 16
<<< DELETING THE SUBVOL NOW >>>
2023-11-07T16:44:52.089+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
unique: 30, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 4167759
2023-11-07T16:45:05.112+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.114+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.116+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.118+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.120+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.122+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.125+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.126+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.128+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.130+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.132+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.135+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.137+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.139+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.141+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.143+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.145+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.147+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.150+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.151+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.154+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.156+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.158+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
2023-11-07T16:45:05.159+0000 7f219d7fa700  0 client.1712074284 ms_handle_remote_reset on v2:188.184.83.152:6800/665755794
   unique: 30, error: -2 (No such file or directory), outsize: 16
<<< umount >>>
   unique: 30, error: -2 (No such file or directory), outsize: 16
unique: 32, opcode: GETATTR (3), nodeid: 1, insize: 56, pid: 4168557
   unique: 32, error: -2 (No such file or directory), outsize: 16
ceph-fuse[4167065]: fuse finished with error 0 and tester_r 0

Kernel logs:

[root@rvasek-1-27-6-2-qqbsjsnaopix-node-0 core]# dmesg | grep -i ceph
[432918.048321] libceph: mon1 (1)188.184.94.56:6790 session established
[432918.050011] libceph: client1695439645 fsid dd535a7e-4647-4bee-853d-f34112615f81
[433038.193633] libceph: mds0 (1)188.184.83.152:6801 socket closed (con state OPEN)
[433039.056883] libceph: mds0 (1)188.184.83.152:6801 session reset
[433039.056890] ceph: mds0 closed our session
[433039.056891] ceph: mds0 reconnect start
[433039.058421] ceph: mds0 reconnect denied
[433040.732088] libceph: mds0 (1)188.184.83.152:6801 socket closed (con state V1_CONNECT_MSG)

Host:

# uname -a
Linux rvasek-1-27-6-2-qqbsjsnaopix-node-0 6.4.15-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Sep  7 00:25:01 UTC 2023 x86_64 GNU/Linux

Cheers,
Robert Vasek

Actions #1

Updated by Venky Shankar 6 months ago

  • Category set to Correctness/Safety
  • Assignee set to Kotresh Hiremath Ravishankar
  • Target version set to v19.0.0
  • Backport set to quincy,reef
Actions #2

Updated by Venky Shankar 6 months ago

  • Priority changed from Normal to High
  • Severity changed from 3 - minor to 2 - major
Actions #3

Updated by Kotresh Hiremath Ravishankar 5 months ago

On the latest main, I am seeing a RADOS time out error. I will dig further into this.

kotresh:build$ sudo bin/ceph-fuse -c ./ceph.conf -r /volumes/_nogroup/sub_0/ded5fa4a-e949-4e40-9be6-d20ddee80bff /mnt
2023-11-30T18:51:38.974+0530 7eff6da634c0 -1 WARNING: all dangerous and experimental features are enabled.
2023-11-30T18:51:38.979+0530 7eff6da634c0 -1 WARNING: all dangerous and experimental features are enabled.
2023-11-30T18:51:38.982+0530 7eff6da634c0 -1 WARNING: all dangerous and experimental features are enabled.
2023-11-30T18:51:38.983+0530 7eff6da634c0 -1 init, newargv = 0x55632e0df320 newargc=15
ceph-fuse[38852]: starting ceph client
ceph-fuse[38852]: starting fuse

kotresh:ceph$ bin/ceph fs subvolume rm a sub_0
bash: bin/ceph: No such file or directory
kotresh:ceph$ upceph
kotresh:ceph$ cd build
kotresh:build$ bin/ceph fs subvolume rm a sub_0
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2023-11-30T18:53:09.182+0530 7fa1edc586c0 -1 WARNING: all dangerous and experimental features are enabled.

2023-11-30T18:58:09.187+0530 7fa1edc586c0  0 monclient(hunting): authenticate timed out after 300
[errno 110] RADOS timed out (error connecting to the cluster)

Actions

Also available in: Atom PDF