Project

General

Profile

Bug #16451

Updated by Loïc Dachary almost 8 years ago

Description of problem:
When using ceph-deploy with the --zap-disk and --dmcrypt option, ceph-deploy seems to call the zap function in ceph-disk without unmounting the disk from the osd-lockbox first. The sgdisk times out (5 timeouts of 60 seconds each) and the osd creation fails.

Version-Release number of selected component (if applicable):
tested:
ceph-deploy: 1.5.24, 1.5.30, 1.5.34
ceph-disk: v10.2.0, v10.2.1, v10.2.2

How reproducible:
100%

Steps to Reproduce:
1. ceph-deploy osd create --zap-disk --dmcrypt host:sd{a..b}
2.
3.

Actual results:
The OSD creation times out while waiting on udevadm. Note the osd-lockbox does not get unmounted which may or may not be by design. Also that sgdisk zap is run against the drive while the partition is mounted (which fails).

Expected results:
The OSD creation should succeed.

Additional info:
<pre>
[redacted-host][WARNIN] populate: Mounting lockbox mount -t ext4 /dev/sda3 /var/lib/ceph/osd-lockbox/redacted
[redacted-host][WARNIN] command_check_call: Running command: /bin/mount -t ext4 /dev/sda3 /var/lib/ceph/osd-lockbox/redacted
[redacted-host][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd-lockbox/redacted/osd-uuid.3089.tmp
[redacted-host][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
[redacted-host][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type
[redacted-host][WARNIN] command_check_call: Running command: /usr/bin/ceph config-key put dm-crypt/osd/redacted/luks iE0N25NKgkvOlZxfnN9IEJBfBwO6HCcM0oZeIuowpgFuxsn/yLxz8hDmXZzesQY3MKI1wPWkyzETpV+dw0yBECX/TbAldHqTxYj/W+d6zbKkVe61TABZfIYxjdS+KFu80QaFGlHqBnY5Gj3rXalHE/qquS81XUvsXfafAFTqY8E=
[redacted-host][WARNIN] value stored
[redacted-host][WARNIN] command: Running command: /usr/bin/ceph auth get-or-create client.osd-lockbox.redacted mon allow command "config-key get" with key="dm-crypt/osd/redacted/luks"
[redacted-host][WARNIN] create_key: stderr
[redacted-host][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd-lockbox/redacted/key-management-mode.3089.tmp
[redacted-host][WARNIN] adjust_symlink: Creating symlink /var/lib/ceph/osd-lockbox/8dc95d04-65a7-4dee-97d4-6b5ff1117f0d -> /var/lib/ceph/osd-lockbox/redacted
[redacted-host][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd-lockbox/redacted/journal-uuid.3089.tmp
[redacted-host][WARNIN] command: Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd-lockbox/redacted/magic.3089.tmp
[redacted-host][WARNIN] command_check_call: Running command: /sbin/sgdisk --typecode=3:fb3aabf9-d25f-47cc-bf5e-721d1816496b -- /dev/sda
[redacted-host][DEBUG ] Warning: The kernel is still using the old partition table.
[redacted-host][DEBUG ] The new table will be used at the next reboot.
[redacted-host][DEBUG ] The operation has completed successfully.
[redacted-host][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
[redacted-host][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[redacted-host][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[redacted-host][WARNIN] get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
[redacted-host][WARNIN] zap: Zapping partition table on /dev/sda
[redacted-host][WARNIN] command_check_call: Running command: /sbin/sgdisk --zap-all -- /dev/sda
[redacted-host][WARNIN] Caution: invalid backup GPT header, but valid main header; regenerating
[redacted-host][WARNIN] backup header from main header.
[redacted-host][WARNIN]
[redacted-host][WARNIN] Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
[redacted-host][WARNIN] on the recovery & transformation menu to examine the two tables.
[redacted-host][WARNIN]
[redacted-host][WARNIN] Warning! One or more CRCs don't match. You should repair the disk!
[redacted-host][WARNIN]
[redacted-host][DEBUG ] ****************************************************************************
[redacted-host][DEBUG ] Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
[redacted-host][DEBUG ] verification and recovery are STRONGLY recommended.
[redacted-host][DEBUG ] ****************************************************************************
[redacted-host][DEBUG ] Warning: The kernel is still using the old partition table.
redacted-host][DEBUG ] The new table will be used at the next reboot.
[redacted-host][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or
[redacted-host][DEBUG ] other utilities.
[redacted-host][WARNIN] command_check_call: Running command: /sbin/sgdisk --clear --mbrtogpt -- /dev/sda
[redacted-host][DEBUG ] Creating new GPT entries.
[redacted-host][DEBUG ] Warning: The kernel is still using the old partition table.
[redacted-host][DEBUG ] The new table will be used at the next reboot.
[redacted-host][DEBUG ] The operation has completed successfully.
[redacted-host][WARNIN] update_partition: Calling partprobe on zapped device /dev/sda
[redacted-host][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[redacted-host][WARNIN] command: Running command: /sbin/partprobe /dev/sda
[redacted-host][WARNIN] update_partition: partprobe /dev/sda failed : Error: Partition(s) 3 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
[redacted-host][WARNIN] (ignored, waiting 60s)
[redacted-host][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[redacted-host][WARNIN] command: Running command: /sbin/partprobe /dev/sda
[redacted-host][WARNIN] update_partition: partprobe /dev/sda failed : Error: Partition(s) 3 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
[redacted-host][WARNIN] (ignored, waiting 60s)
[redacted-host][WARNIN] command_check_call: Running command: /sbin/udevadm settle --timeout=600
[redacted-host][WARNIN] command: Running command: /sbin/partprobe /dev/sda
[redacted-host][WARNIN] update_partition: partprobe /dev/sda failed : Error: Partition(s) 3 on /dev/sda have been written, but we have been unable to inform the kernel of he change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
[redacted-host][WARNIN] (ignored, waiting 60s)

Eventually times out.
</pre>

Back