Project

General

Profile

Actions

Bug #13160

closed

ceph-disk: duplicate osd mount points

Added by Loïc Dachary over 8 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

  • ceph-disk prepare /dev/vdb
  • ceph-disk list
$ grep vdb /proc/mounts
/dev/vdb1 /var/lib/ceph/tmp/mnt.y9HL5a xfs rw,seclabel,noatime,attr2,inode64,noquota 0 0
/dev/vdb1 /var/lib/ceph/osd/ceph-0 xfs rw,seclabel,noatime,attr2,inode64,noquota 0 0

This happens when ceph-disk activate mounts /var/lib/ceph/osd/ceph-0 while ceph-disk mounts /var/lib/ceph/tmp/mnt.y9HL5a to get information about the data partition. If ceph-disk is interrupted or fails for any reason, it will not umount the temporary partition and both will stay mounted.


Related issues 2 (0 open2 closed)

Has duplicate Ceph - Bug #14451: ceph-disk: duplicate osd mount points and umount fails to unmount temporary mount Duplicate01/21/2016

Actions
Copied to Ceph - Backport #14491: hammer: ceph-disk: duplicate osd mount pointsRejectedLoïc DacharyActions
Actions #1

Updated by Loïc Dachary over 8 years ago

It probably is a leftover of

    command_check_call(
        [
            '/bin/umount',
            '-l',   # lazy, in case someone else is peeking at the                                                                                                                                                 
                   # wrong moment                                                                                                                                                                                 
            '--',
            path,
            ],
        )


It does not umount because ceph-disk list happens to look at it. And since the device is in use after that, the umount is not executed.

Actions #2

Updated by Loïc Dachary over 8 years ago

  • Status changed from In Progress to Won't Fix

The scenario creating duplicate mount points is when the the osd is destroyed while activating. This is a border case that only happens during testing and it probably is not worth fixing. Instead the test case must wait for the osd activation to complete.

Actions #3

Updated by Loïc Dachary over 8 years ago

  • Status changed from Won't Fix to In Progress
Actions #4

Updated by Loïc Dachary over 8 years ago

The ceph-activate that was run via systemctl is interrupted before it can finish and that leaves the file system mounted and, in some cases, incompletely set. The following is a mix of the output of udevadm monitor and /var/log/messages. Two consecutive udev add vdb1 are received and the second one kills the first one.

UDEV  [57419.754877] add      /devices/pci0000:00/0000:00:06.0/virtio3/block/vdb/vdb1 (block)
UDEV  [57419.776352] remove   /devices/pci0000:00/0000:00:06.0/virtio3/block/vdb/vdb1 (block)
Sep 18 22:46:30 target230089 ceph-disk: INFO: main_trigger: Namespace(dev='/dev/vdb1', func=<function main_trigger at 0x1b776e0>, log_stdout=True, prepend_to_path='/usr/bin', prog='ceph-disk', statedir='/var/lib/ceph', sync=True, sysconfdir='/etc/ceph', verbose=True)
....
Sep 18 22:46:30 target230089 ceph-disk: DEBUG: does /usr/bin/init exists ?
Sep 18 22:46:30 target230089 ceph-disk: DEBUG: does /usr/local/sbin/init exists ?

Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:yes, /usr/bin/ceph-conf exists
Sep 18 22:46:35 target230089 ceph-disk: INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup init
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:does /usr/bin/ceph-detect-init exists ?
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:yes, /usr/bin/ceph-detect-init exists
Sep 18 22:46:35 target230089 ceph-disk: INFO:ceph-disk:Running command: /usr/bin/ceph-detect-init --default sysvinit
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:Marking with init system systemd
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:Authorizing OSD key...
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:does /usr/bin/ceph exists ?
Sep 18 22:46:35 target230089 ceph-disk: DEBUG:ceph-disk:yes, /usr/bin/ceph exists
Sep 18 22:46:35 target230089 ceph-disk: INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring auth add osd.0 -i /var/lib/ceph/tmp/mnt.GJfmAN/keyring osd allow * mon allow profile osd
Sep 18 22:46:36 target230089 systemd: Stopping Ceph disk activation: /dev/vdb1...
Sep 18 22:46:36 target230089 systemd: Starting Ceph disk activation: /dev/vdb1...
UDEV  [57425.650501] add      /devices/pci0000:00/0000:00:06.0/virtio3/block/vdb/vdb1 (block)
Sep 18 22:46:36 target230089 ceph-disk: INFO: main_trigger: Namespace(dev='/dev/vdb1', func=<function main_trigger at 0xa946e0>, log_stdout=True, prepend_to_path='/usr/bin', prog='ceph-disk', statedir='/var/lib/ceph', sync=True, sysconfdir='/etc/ceph', verbose=True)
Sep 18 22:46:36 target230089 ceph-disk: DEBUG: does /usr/bin/init exists ?
Actions #5

Updated by Loïc Dachary over 8 years ago

  • Priority changed from Normal to Urgent
Actions #6

Updated by Loïc Dachary over 8 years ago

diff --git a/systemd/ceph-disk@.service b/systemd/ceph-disk@.service
index 88e4aef..cff7e9f 100644
--- a/systemd/ceph-disk@.service
+++ b/systemd/ceph-disk@.service
@@ -3,6 +3,6 @@ Description=Ceph disk activation: %f

 [Service]
 Type=oneshot
-RemainAfterExit=yes
-ExecStart=/usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f
+KillMode=none
+ExecStart=/bin/flock /var/lock/ceph-disk -c '/usr/sbin/ceph-disk --verbose --log-stdout trigger --sync %f'
 TimeoutSec=0
Actions #7

Updated by Joe Julian over 8 years ago

I think it would be better to move flock outside of the systemd service in the udev action with the -n option. This would prevent the systemd script from being interrupted, will prevent multiple simultaneous starts, and will prevent the service from being in a failed state due to a race failure. It should still "RemainAferExit" as after it's been run, it's complete and should not run again if activated again.

Honestly, for a oneshot service that hasn't finished, multiple starts should not result in the service being started multiple times. This seems like a systemd bug.

Actions #8

Updated by Loïc Dachary over 8 years ago

With the current approach, the latest we cover all cases, at the expense of running activate for each and every udev add event. If we flock -n instead, we open the following race condition:

  • udev add -> flock -n takes the lock and runs ceph-disk activate
  • ceph-disk activate does not have enough to run because ceph-disk prepare is not finished preparing the device and starts to shutdown but is not stopped yet
  • ceph-disk prepare finishes preparing the device and fires a udev add to notify that to the world
  • udev add -> flock -n sees the lock is held and gives up
  • ceph-disk activate finish shutdown

There won't be another udev add event after that and the device won't activate. Although it is more expensive, I think running ceph-disk activate on every event addresses all possible concurrency scenario. This is assuming ceph-disk activate

  • is idempotent
  • is robust enough to not try to activate a partially prepared device

What do you think ?

Actions #9

Updated by Loïc Dachary over 8 years ago

  • Status changed from In Progress to Fix Under Review
Actions #10

Updated by Loïc Dachary over 8 years ago

  • Status changed from Fix Under Review to Resolved
Actions #11

Updated by Loïc Dachary about 8 years ago

  • Has duplicate Bug #14451: ceph-disk: duplicate osd mount points and umount fails to unmount temporary mount added
Actions #12

Updated by Loïc Dachary about 8 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to hammer
Actions #14

Updated by Loïc Dachary about 8 years ago

  • Copied to Backport #14491: hammer: ceph-disk: duplicate osd mount points added
Actions #15

Updated by Nathan Cutler over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF