Feature #59601: Provide way to abort kernel mount after lazy umount - CephFS - Ceph

Actions

Copy link

Feature #59601

open

Provide way to abort kernel mount after lazy umount

Added by Niklas Hambuechen 12 months ago. Updated 12 months ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Reviewed:

Affected Versions:

Ceph - v16.2.7

Component(FS):

kceph

Labels (FS):

Pull request ID:

Description

In some situations, e.g. when changing monitor IPs during an emergency network reconfiguration, CephFS kernel mounts (kclient) can be stuck trying to talk IPs that do not speak Ceph.

In `dmesg` it can look like this:

[312094.140219] libceph: osd1 (2)10.0.0.1:6800 read processing error
[312094.650106] libceph: mon1 (2)10.0.0.1:3300 socket closed (con state V2_BANNER_PREFIX)
[312099.771370] libceph: auth protocol 'cephx' authorization to mds failed: -13

For this or other reasons, the user may need to use `umount --lazy` / `umount -l` to get their mount point back and start a new mount.

However, the old mount is still going on in the background, and apparently cannot be ejected.

For example

/sys/kernel/debug/ceph/478b062f-a6e4-4ddf-96e0-7cdad91816e4.client38260/

still exists, and may show ops such as:

# cat /sys/kernel/debug/ceph/478b062f-a6e4-4ddf-96e0-7cdad91816e4.client38260/osdc          
REQUESTS 1 homeless 0
910    osd1    1.671086fe    1.3e    [1,0,2]/1    [1,0,2]/1    e254    100000002c3.00000000    0x400024    1    write
LINGER REQUESTS
BACKOFFS

There should be a way for admins to force-eject such detached mounts.

Based on this thread https://www.spinics.net/lists/ceph-devel/msg00555.html it seemed that developers agreed:

Sage Weil:

'umount -l' should do a lazy unmount (detach from namespace), but the actual unmount code may currently hang.

and further it is discussed that there should be a configurable timeout, or an admin hook to kick out the mount.

In this ticket I'm asking for the latter.

(If this already exists, please considere it a documentation bug; it should be added on e.g. https://docs.ceph.com/en/quincy/cephfs/mount-using-kernel-driver/#unmounting-cephfs)

Actions

Copy link

Updated by Venky Shankar 12 months ago

Niklas Hambuechen wrote:

In some situations, e.g. when changing monitor IPs during an emergency network reconfiguration, CephFS kernel mounts (kclient) can be stuck trying to talk IPs that do not speak Ceph.

In `dmesg` it can look like this:

[...]

For this or other reasons, the user may need to use `umount --lazy` / `umount -l` to get their mount point back and start a new mount.

However, the old mount is still going on in the background, and apparently cannot be ejected.

For example

[...]

still exists, and may show ops such as:

[...]

There should be a way for admins to force-eject such detached mounts.

Based on this thread https://www.spinics.net/lists/ceph-devel/msg00555.html it seemed that developers agreed:

Sage Weil:

'umount -l' should do a lazy unmount (detach from namespace), but the actual unmount code may currently hang.

and further it is discussed that there should be a configurable timeout, or an admin hook to kick out the mount.

In this ticket I'm asking for the latter.

(If this already exists, please considere it a documentation bug; it should be added on e.g. https://docs.ceph.com/en/quincy/cephfs/mount-using-kernel-driver/#unmounting-cephfs)

Have you tried force unmounting the mount (unount -f)?

Actions

Copy link

Updated by Niklas Hambuechen 12 months ago

Venky Shankar wrote:

Have you tried force unmounting the mount (unount -f)?

After umount --lazy, the mount point is already empty.

In that case, what could I run umount -f on?

Actions

Copy link

Updated by Venky Shankar 12 months ago

Niklas Hambuechen wrote:

Venky Shankar wrote:

Have you tried force unmounting the mount (unount -f)?

After umount --lazy, the mount point is already empty.

In that case, what could I run umount -f on?

What I meant was to use `umount -f` instead of lazy unmount. Since a bunch of addresses changed and the IOs are not going anywhere,

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Feature #59601

Provide way to abort kernel mount after lazy umount

Updated by Venky Shankar 12 months ago

Updated by Niklas Hambuechen 12 months ago

Updated by Venky Shankar 12 months ago