Project

General

Profile

Bug #13160

ceph-disk: duplicate osd mount points

Added by Loic Dachary over 1 year ago. Updated about 1 year ago.

Status:
Pending Backport
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
09/18/2015
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

  • ceph-disk prepare /dev/vdb
  • ceph-disk list
$ grep vdb /proc/mounts
/dev/vdb1 /var/lib/ceph/tmp/mnt.y9HL5a xfs rw,seclabel,noatime,attr2,inode64,noquota 0 0
/dev/vdb1 /var/lib/ceph/osd/ceph-0 xfs rw,seclabel,noatime,attr2,inode64,noquota 0 0

This happens when ceph-disk activate mounts /var/lib/ceph/osd/ceph-0 while ceph-disk mounts /var/lib/ceph/tmp/mnt.y9HL5a to get information about the data partition. If ceph-disk is interrupted or fails for any reason, it will not umount the temporary partition and both will stay mounted.


Related issues

Duplicated by Bug #14451: ceph-disk: duplicate osd mount points and umount fails to unmount temporary mount Duplicate 01/21/2016
Copied to Backport #14491: hammer: ceph-disk: duplicate osd mount points Need More Info

Associated revisions

Revision f0a47578 (diff)
Added by Loic Dachary over 1 year ago

ceph-disk: systemd must not kill a running ceph-disk

When activating a device, ceph-disk trigger restarts the ceph-disk
systemd service. Two consecutive udev add on the same device will
restart the ceph-disk systemd service and the second one may kill the
first one, leaving the device half activated.

The ceph-disk systemd service is instructed to not kill an existing
process when restarting. The second run waits (via flock) for the second
one to complete before running so that they do not overlap.

http://tracker.ceph.com/issues/13160 Fixes: #13160

Signed-off-by: Loic Dachary <>

History

#1 Updated by Loic Dachary over 1 year ago

It probably is a leftover of

    command_check_call(
        [
            '/bin/umount',
            '-l',   # lazy, in case someone else is peeking at the                                                                                                                                                 
                   # wrong moment                                                                                                                                                                                 
            '--',
            path,
            ],
        )


It does not umount because ceph-disk list happens to look at it. And since the device is in use after that, the umount is not executed.

#2 Updated by Loic Dachary over 1 year ago

  • Status changed from In Progress to Won't Fix

The scenario creating duplicate mount points is when the the osd is destroyed while activating. This is a border case that only happens during testing and it probably is not worth fixing. Instead the test case must wait for the osd activation to complete.

#3 Updated by Loic Dachary over 1 year ago

  • Status changed from Won't Fix to In Progress

#4 Updated by Loic Dachary over 1 year ago

The ceph-activate that was run via systemctl is interrupted before it can finish and that leaves the file system mounted and, in some cases, incompletely set. The following is a mix of the output of udevadm monitor and /var/log/messages. Two consecutive udev add vdb1 are received and the second one kills the first one.

UDEV  [57419.754877] add      /devices/pci0000:00/0000:00:06.0/virtio3/block/vdb/vdb1 (block)
UDEV  [57419.776352] remove   /devices/pci0000:00/0000:00:06.0/virtio3/block/vdb/vdb1 (block)
Sep 18 22:46:30 target230089 ceph-disk: INFO: main_trigger: Namespace(dev='/dev/vdb1', func=<function main_trigger at 0x1b776e0>, log_stdout=True, prepend_to_path='/usr/bin', prog='ceph-disk', statedir='/var/lib/ceph', sync=True, sysconfdir='/etc/ceph', verbose=True)
....
Sep 18 22:46:30 target230089 ceph-disk: DEBUG: does /usr/bin/init exists ?
Sep 18 22:46:30 target230089 ceph-disk: DEBUG: does /usr/local/sbin/init exists ?

Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:yes, /usr/bin/ceph-conf exists
Sep 18 22:46:35 target230089 ceph-disk: INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup init
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:does /usr/bin/ceph-detect-init exists ?
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:yes, /usr/bin/ceph-detect-init exists
Sep 18 22:46:35 target230089 ceph-disk: INFO:ceph-disk:Running command: /usr/bin/ceph-detect-init --default sysvinit
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:Marking with init system systemd
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:Authorizing OSD key...
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:does /usr/bin/ceph exists ?
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:yes, /usr/bin/ceph exists
Sep 18 22:46:35 target230089 ceph-disk: INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring auth add osd.0 -i /var/lib/ceph/tmp/mnt.GJfmAN/keyring osd allow * mon allow profile osd
Sep 18 22:46:36 target230089 systemd: Stopping Ceph disk activation: /dev/vdb1...
Sep 18 22:46:36 target230089 systemd: Starting Ceph disk activation: /dev/vdb1...
UDEV  [57425.650501] add      /devices/pci0000:00/0000:00:06.0/virtio3/block/vdb/vdb1 (block)
Sep 18 22:46:36 target230089 ceph-disk: INFO: main_trigger: Namespace(dev='/dev/vdb1', func=<function main_trigger at 0xa946e0>, log_stdout=True, prepend_to_path='/usr/bin', prog='ceph-disk', statedir='/var/lib/ceph', sync=True, sysconfdir='/etc/ceph', verbose=True)
Sep 18 22:46:36 target230089 ceph-disk: DEBUG: does /usr/bin/init exists ?

#5 Updated by Loic Dachary over 1 year ago

  • Priority changed from Normal to Urgent

#6 Updated by Loic Dachary over 1 year ago

diff --git a/systemd/ceph-disk@.service b/systemd/ceph-disk@.service
index 88e4aef..cff7e9f 100644
--- a/systemd/ceph-disk@.service
+++ b/systemd/ceph-disk@.service
@@ -3,6 +3,6 @@ Description=Ceph disk activation: %f

 [Service]
 Type=oneshot
-RemainAfterExit=yes
-ExecStart=/usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f
+KillMode=none
+ExecStart=/bin/flock /var/lock/ceph-disk -c '/usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f'
 TimeoutSec=0

#7 Updated by Joe Julian over 1 year ago

I think it would be better to move flock outside of the systemd service in the udev action with the -n option. This would prevent the systemd script from being interrupted, will prevent multiple simultaneous starts, and will prevent the service from being in a failed state due to a race failure. It should still "RemainAferExit" as after it's been run, it's complete and should not run again if activated again.

Honestly, for a oneshot service that hasn't finished, multiple starts should not result in the service being started multiple times. This seems like a systemd bug.

#8 Updated by Loic Dachary over 1 year ago

With the current approach, the latest we cover all cases, at the expense of running activate for each and every udev add event. If we flock -n instead, we open the following race condition:

  • udev add -> flock -n takes the lock and runs ceph-disk activate
  • ceph-disk activate does not have enough to run because ceph-disk prepare is not finished preparing the device and starts to shutdown but is not stopped yet
  • ceph-disk prepare finishes preparing the device and fires a udev add to notify that to the world
  • udev add -> flock -n sees the lock is held and gives up
  • ceph-disk activate finish shutdown

There won't be another udev add event after that and the device won't activate. Although it is more expensive, I think running ceph-disk activate on every event addresses all possible concurrency scenario. This is assuming ceph-disk activate

  • is idempotent
  • is robust enough to not try to activate a partially prepared device

What do you think ?

#9 Updated by Loic Dachary over 1 year ago

  • Status changed from In Progress to Need Review

#10 Updated by Loic Dachary over 1 year ago

  • Status changed from Need Review to Resolved

#11 Updated by Loic Dachary about 1 year ago

  • Duplicated by Bug #14451: ceph-disk: duplicate osd mount points and umount fails to unmount temporary mount added

#12 Updated by Loic Dachary about 1 year ago

  • Status changed from Resolved to Pending Backport
  • Backport set to hammer

#14 Updated by Loic Dachary about 1 year ago

  • Copied to Backport #14491: hammer: ceph-disk: duplicate osd mount points added

Also available in: Atom PDF