Bug #13160: ceph-disk: duplicate osd mount points - Ceph - Ceph

Actions

Copy link

Bug #13160

closed

ceph-disk: duplicate osd mount points

Added by Loïc Dachary over 8 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Loïc Dachary

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

hammer

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

ceph-disk prepare /dev/vdb
ceph-disk list

$ grep vdb /proc/mounts
/dev/vdb1 /var/lib/ceph/tmp/mnt.y9HL5a xfs rw,seclabel,noatime,attr2,inode64,noquota 0 0
/dev/vdb1 /var/lib/ceph/osd/ceph-0 xfs rw,seclabel,noatime,attr2,inode64,noquota 0 0

This happens when ceph-disk activate mounts /var/lib/ceph/osd/ceph-0 while ceph-disk mounts /var/lib/ceph/tmp/mnt.y9HL5a to get information about the data partition. If ceph-disk is interrupted or fails for any reason, it will not umount the temporary partition and both will stay mounted.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

It probably is a leftover of

    command_check_call(
        [
            '/bin/umount',
            '-l',   # lazy, in case someone else is peeking at the                                                                                                                                                 
                   # wrong moment                                                                                                                                                                                 
            '--',
            path,
            ],
        )

It does not umount because ceph-disk list happens to look at it. And since the device is in use after that, the umount is not executed.

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

Status changed from In Progress to Won't Fix

The scenario creating duplicate mount points is when the the osd is destroyed while activating. This is a border case that only happens during testing and it probably is not worth fixing. Instead the test case must wait for the osd activation to complete.

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

Status changed from Won't Fix to In Progress

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

The ceph-activate that was run via systemctl is interrupted before it can finish and that leaves the file system mounted and, in some cases, incompletely set. The following is a mix of the output of udevadm monitor and /var/log/messages. Two consecutive udev add vdb1 are received and the second one kills the first one.

UDEV  [57419.754877] add      /devices/pci0000:00/0000:00:06.0/virtio3/block/vdb/vdb1 (block)
UDEV  [57419.776352] remove   /devices/pci0000:00/0000:00:06.0/virtio3/block/vdb/vdb1 (block)
Sep 18 22:46:30 target230089 ceph-disk: INFO: main_trigger: Namespace(dev='/dev/vdb1', func=<function main_trigger at 0x1b776e0>, log_stdout=True, prepend_to_path='/usr/bin', prog='ceph-disk', statedir='/var/lib/ceph', sync=True, sysconfdir='/etc/ceph', verbose=True)
....
Sep 18 22:46:30 target230089 ceph-disk: DEBUG: does /usr/bin/init exists ?
Sep 18 22:46:30 target230089 ceph-disk: DEBUG: does /usr/local/sbin/init exists ?

Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:yes, /usr/bin/ceph-conf exists
Sep 18 22:46:35 target230089 ceph-disk: INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup init
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:does /usr/bin/ceph-detect-init exists ?
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:yes, /usr/bin/ceph-detect-init exists
Sep 18 22:46:35 target230089 ceph-disk: INFO:ceph-disk:Running command: /usr/bin/ceph-detect-init --default sysvinit
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:Marking with init system systemd
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:Authorizing OSD key...
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:does /usr/bin/ceph exists ?
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:yes, /usr/bin/ceph exists
Sep 18 22:46:35 target230089 ceph-disk: INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring auth add osd.0 -i /var/lib/ceph/tmp/mnt.GJfmAN/keyring osd allow * mon allow profile osd
Sep 18 22:46:36 target230089 systemd: Stopping Ceph disk activation: /dev/vdb1...
Sep 18 22:46:36 target230089 systemd: Starting Ceph disk activation: /dev/vdb1...
UDEV  [57425.650501] add      /devices/pci0000:00/0000:00:06.0/virtio3/block/vdb/vdb1 (block)
Sep 18 22:46:36 target230089 ceph-disk: INFO: main_trigger: Namespace(dev='/dev/vdb1', func=<function main_trigger at 0xa946e0>, log_stdout=True, prepend_to_path='/usr/bin', prog='ceph-disk', statedir='/var/lib/ceph', sync=True, sysconfdir='/etc/ceph', verbose=True)
Sep 18 22:46:36 target230089 ceph-disk: DEBUG: does /usr/bin/init exists ?

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

Priority changed from Normal to Urgent

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

diff --git a/systemd/ceph-disk@.service b/systemd/ceph-disk@.service
index 88e4aef..cff7e9f 100644
--- a/systemd/ceph-disk@.service
+++ b/systemd/ceph-disk@.service
@@ -3,6 +3,6 @@ Description=Ceph disk activation: %f

 [Service]
 Type=oneshot
-RemainAfterExit=yes
-ExecStart=/usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f
+KillMode=none
+ExecStart=/bin/flock /var/lock/ceph-disk -c '/usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f'
 TimeoutSec=0

Actions

Copy link

Updated by Joe Julian over 8 years ago

I think it would be better to move flock outside of the systemd service in the udev action with the -n option. This would prevent the systemd script from being interrupted, will prevent multiple simultaneous starts, and will prevent the service from being in a failed state due to a race failure. It should still "RemainAferExit" as after it's been run, it's complete and should not run again if activated again.

Honestly, for a oneshot service that hasn't finished, multiple starts should not result in the service being started multiple times. This seems like a systemd bug.

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

With the current approach, the latest we cover all cases, at the expense of running activate for each and every udev add event. If we flock -n instead, we open the following race condition:

udev add -> flock -n takes the lock and runs ceph-disk activate
ceph-disk activate does not have enough to run because ceph-disk prepare is not finished preparing the device and starts to shutdown but is not stopped yet
ceph-disk prepare finishes preparing the device and fires a udev add to notify that to the world
udev add -> flock -n sees the lock is held and gives up
ceph-disk activate finish shutdown

There won't be another udev add event after that and the device won't activate. Although it is more expensive, I think running ceph-disk activate on every event addresses all possible concurrency scenario. This is assuming ceph-disk activate