Bug #13160
closedceph-disk: duplicate osd mount points
0%
Description
- ceph-disk prepare /dev/vdb
- ceph-disk list
$ grep vdb /proc/mounts /dev/vdb1 /var/lib/ceph/tmp/mnt.y9HL5a xfs rw,seclabel,noatime,attr2,inode64,noquota 0 0 /dev/vdb1 /var/lib/ceph/osd/ceph-0 xfs rw,seclabel,noatime,attr2,inode64,noquota 0 0
This happens when ceph-disk activate mounts /var/lib/ceph/osd/ceph-0 while ceph-disk mounts /var/lib/ceph/tmp/mnt.y9HL5a to get information about the data partition. If ceph-disk is interrupted or fails for any reason, it will not umount the temporary partition and both will stay mounted.
Updated by Loïc Dachary over 8 years ago
It probably is a leftover of
command_check_call( [ '/bin/umount', '-l', # lazy, in case someone else is peeking at the # wrong moment '--', path, ], )
It does not umount because ceph-disk list happens to look at it. And since the device is in use after that, the umount is not executed.
Updated by Loïc Dachary over 8 years ago
- Status changed from In Progress to Won't Fix
The scenario creating duplicate mount points is when the the osd is destroyed while activating. This is a border case that only happens during testing and it probably is not worth fixing. Instead the test case must wait for the osd activation to complete.
Updated by Loïc Dachary over 8 years ago
- Status changed from Won't Fix to In Progress
Updated by Loïc Dachary over 8 years ago
The ceph-activate that was run via systemctl is interrupted before it can finish and that leaves the file system mounted and, in some cases, incompletely set. The following is a mix of the output of udevadm monitor and /var/log/messages. Two consecutive udev add vdb1 are received and the second one kills the first one.
UDEV [57419.754877] add /devices/pci0000:00/0000:00:06.0/virtio3/block/vdb/vdb1 (block) UDEV [57419.776352] remove /devices/pci0000:00/0000:00:06.0/virtio3/block/vdb/vdb1 (block) Sep 18 22:46:30 target230089 ceph-disk: INFO: main_trigger: Namespace(dev='/dev/vdb1', func=<function main_trigger at 0x1b776e0>, log_stdout=True, prepend_to_path='/usr/bin', prog='ceph-disk', statedir='/var/lib/ceph', sync=True, sysconfdir='/etc/ceph', verbose=True) .... Sep 18 22:46:30 target230089 ceph-disk: DEBUG: does /usr/bin/init exists ? Sep 18 22:46:30 target230089 ceph-disk: DEBUG: does /usr/local/sbin/init exists ? Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:yes, /usr/bin/ceph-conf exists Sep 18 22:46:35 target230089 ceph-disk: INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup init Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:does /usr/bin/ceph-detect-init exists ? Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:yes, /usr/bin/ceph-detect-init exists Sep 18 22:46:35 target230089 ceph-disk: INFO:ceph-disk:Running command: /usr/bin/ceph-detect-init --default sysvinit Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:Marking with init system systemd Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:Authorizing OSD key... Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:does /usr/bin/ceph exists ? Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:yes, /usr/bin/ceph exists Sep 18 22:46:35 target230089 ceph-disk: INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring auth add osd.0 -i /var/lib/ceph/tmp/mnt.GJfmAN/keyring osd allow * mon allow profile osd Sep 18 22:46:36 target230089 systemd: Stopping Ceph disk activation: /dev/vdb1... Sep 18 22:46:36 target230089 systemd: Starting Ceph disk activation: /dev/vdb1... UDEV [57425.650501] add /devices/pci0000:00/0000:00:06.0/virtio3/block/vdb/vdb1 (block) Sep 18 22:46:36 target230089 ceph-disk: INFO: main_trigger: Namespace(dev='/dev/vdb1', func=<function main_trigger at 0xa946e0>, log_stdout=True, prepend_to_path='/usr/bin', prog='ceph-disk', statedir='/var/lib/ceph', sync=True, sysconfdir='/etc/ceph', verbose=True) Sep 18 22:46:36 target230089 ceph-disk: DEBUG: does /usr/bin/init exists ?
Updated by Loïc Dachary over 8 years ago
diff --git a/systemd/ceph-disk@.service b/systemd/ceph-disk@.service index 88e4aef..cff7e9f 100644 --- a/systemd/ceph-disk@.service +++ b/systemd/ceph-disk@.service @@ -3,6 +3,6 @@ Description=Ceph disk activation: %f [Service] Type=oneshot -RemainAfterExit=yes -ExecStart=/usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f +KillMode=none +ExecStart=/bin/flock /var/lock/ceph-disk -c '/usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f' TimeoutSec=0
Updated by Joe Julian over 8 years ago
I think it would be better to move flock outside of the systemd service in the udev action with the -n option. This would prevent the systemd script from being interrupted, will prevent multiple simultaneous starts, and will prevent the service from being in a failed state due to a race failure. It should still "RemainAferExit" as after it's been run, it's complete and should not run again if activated again.
Honestly, for a oneshot service that hasn't finished, multiple starts should not result in the service being started multiple times. This seems like a systemd bug.
Updated by Loïc Dachary over 8 years ago
With the current approach, the latest we cover all cases, at the expense of running activate for each and every udev add event. If we flock -n instead, we open the following race condition:
- udev add -> flock -n takes the lock and runs ceph-disk activate
- ceph-disk activate does not have enough to run because ceph-disk prepare is not finished preparing the device and starts to shutdown but is not stopped yet
- ceph-disk prepare finishes preparing the device and fires a udev add to notify that to the world
- udev add -> flock -n sees the lock is held and gives up
- ceph-disk activate finish shutdown
There won't be another udev add event after that and the device won't activate. Although it is more expensive, I think running ceph-disk activate on every event addresses all possible concurrency scenario. This is assuming ceph-disk activate
- is idempotent
- is robust enough to not try to activate a partially prepared device
What do you think ?
Updated by Loïc Dachary over 8 years ago
- Status changed from In Progress to Fix Under Review
Updated by Loïc Dachary over 8 years ago
- Status changed from Fix Under Review to Resolved
Updated by Loïc Dachary about 8 years ago
- Has duplicate Bug #14451: ceph-disk: duplicate osd mount points and umount fails to unmount temporary mount added
Updated by Loïc Dachary about 8 years ago
- Status changed from Resolved to Pending Backport
- Backport set to hammer
Updated by Loïc Dachary about 8 years ago
- Copied to Backport #14491: hammer: ceph-disk: duplicate osd mount points added
Updated by Nathan Cutler over 6 years ago
- Status changed from Pending Backport to Resolved