Feature #59601
openProvide way to abort kernel mount after lazy umount
0%
Description
In some situations, e.g. when changing monitor IPs during an emergency network reconfiguration, CephFS kernel mounts (kclient) can be stuck trying to talk IPs that do not speak Ceph.
In `dmesg` it can look like this:
[312094.140219] libceph: osd1 (2)10.0.0.1:6800 read processing error [312094.650106] libceph: mon1 (2)10.0.0.1:3300 socket closed (con state V2_BANNER_PREFIX) [312099.771370] libceph: auth protocol 'cephx' authorization to mds failed: -13
For this or other reasons, the user may need to use `umount --lazy` / `umount -l` to get their mount point back and start a new mount.
However, the old mount is still going on in the background, and apparently cannot be ejected.
For example
/sys/kernel/debug/ceph/478b062f-a6e4-4ddf-96e0-7cdad91816e4.client38260/
still exists, and may show ops such as:
# cat /sys/kernel/debug/ceph/478b062f-a6e4-4ddf-96e0-7cdad91816e4.client38260/osdc REQUESTS 1 homeless 0 910 osd1 1.671086fe 1.3e [1,0,2]/1 [1,0,2]/1 e254 100000002c3.00000000 0x400024 1 write LINGER REQUESTS BACKOFFS
There should be a way for admins to force-eject such detached mounts.
Based on this thread https://www.spinics.net/lists/ceph-devel/msg00555.html it seemed that developers agreed:
Sage Weil:
'umount -l' should do a lazy unmount (detach from namespace), but the actual unmount code may currently hang.
and further it is discussed that there should be a configurable timeout, or an admin hook to kick out the mount.
In this ticket I'm asking for the latter.
(If this already exists, please considere it a documentation bug; it should be added on e.g. https://docs.ceph.com/en/quincy/cephfs/mount-using-kernel-driver/#unmounting-cephfs)