Project

General

Profile

Bug #14080

ceph-disk: use blkid instead of sgdisk -i

Added by Loic Dachary over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
12/14/2015
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
infernalis,hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

On CentOS 7, on OpenStack (udevadm settle may take 45 sec or more). This blocks testing.

DEBUG:ceph-disk:Calling partprobe on zapped device /dev/vdb
INFO:ceph-disk:Running command: /usr/bin/udevadm settle
Traceback (most recent call last):
  File "/sbin/ceph-disk", line 4036, in <module>
    main(sys.argv[1:])
  File "/sbin/ceph-disk", line 3990, in main
    args.func(args)
  File "/sbin/ceph-disk", line 2849, in main_destroy
    zap(base_dev)
  File "/sbin/ceph-disk", line 1289, in zap
    raise Error(e)
__main__.Error: Error: Command '['/usr/bin/udevadm', 'settle']' returned non-zero exit status 1

kvsrv01.log View - zap+prepare loop log. (88 KB) Matthew Taylor, 12/17/2015 12:24 AM


Related issues

Related to Ceph - Bug #14094: ceph-disk list: OSError: [Errno 2] No such file or directory: '/dev/vdb1' Duplicate 12/16/2015
Blocked by Ceph - Bug #13988: new OSD re-using old OSD id fails to boot Resolved 12/05/2015
Copied to Ceph - Backport #14145: infernalis: ceph-disk: use blkid instead of sgdisk -i Resolved
Copied to Ceph - Backport #14602: hammer: ceph-disk: use blkid instead of sgdisk -i Resolved

Associated revisions

Revision 730b5d62 (diff)
Added by Loic Dachary about 3 years ago

ceph-disk: increase partprobe / udevadm settle timeouts

The default of 120 seconds may be exceeded when the disk is very slow
which can happen in cloud environments. Increase it to 600 seconds
instead.

The partprobe command may fail for the same reason but it does not have
a timeout parameter. Instead, try a few times before failing.

The udevadm settle guarding partprobe are not necessary because
partprobe already does the same. However, partprobe does not provide a
way to control the timeout. Having a udevadm settle after another is
going to be a noop most of the time and not add any delay. It matters
when the udevadm settle run by partprobe fails with a timeout because
partprobe will silentely ignores the failure.

http://tracker.ceph.com/issues/14080 Fixes: #14080

Signed-off-by: Loic Dachary <>

Revision 676a4b42 (diff)
Added by Loic Dachary about 3 years ago

ceph-disk: use blkid instead of sgdisk -i

sgdisk -i 1 /dev/vdb opens /dev/vdb in write mode which indirectly
triggers a BLKRRPART ioctl from udev (starting version 214 and up) when
the device is closed (see below for the udev release note). The
implementation of this ioctl by the kernel (even old kernels) removes
all partitions and adds them again (similar to what partprobe does
explicitly).

The side effects of partitions disappearing while ceph-disk is running
are devastating.

sgdisk is replaced by blkid which only opens the device in read mode and
will not trigger this unexpected behavior.

The problem does not show on Ubuntu 14.04 because it is running udev <
214 but shows on CentOS 7 which is running udev > 214.

git clone git://anonscm.debian.org/pkg-systemd/systemd.git
systemd/NEWS:
CHANGES WITH 214:

  • As an experimental feature, udev now tries to lock the
    disk device node (flock(LOCK_SH|LOCK_NB)) while it
    executes events for the disk or any of its partitions.
    Applications like partitioning programs can lock the
    disk device node (flock(LOCK_EX)) and claim temporary
    device ownership that way; udev will entirely skip all event
    handling for this disk and its partitions. If the disk
    was opened for writing, the close will trigger a partition
    table rescan in udev's "watch" facility, and if needed
    synthesize "change" events for the disk and all its partitions.
    This is now unconditionally enabled, and if it turns out to
    cause major problems, we might turn it on only for specific
    devices, or might need to disable it entirely. Device Mapper
    devices are excluded from this logic.

http://tracker.ceph.com/issues/14080 Fixes: #14080

Signed-off-by: Ilya Dryomov <>
Signed-off-by: Loic Dachary <>
(cherry picked from commit 9dce05a8cdfc564c5162885bbb67a04ad7b95c5a)

History

#1 Updated by Loic Dachary over 3 years ago

Nothing conclusive on dmesg. After a while (a few minutes at most), it comes back to normal.

#2 Updated by Loic Dachary over 3 years ago

Could be that it takes more than 120 seconds.

   udevadm settle [options]
       Watches the udev event queue, and exits if all current events are handled.

       --timeout=seconds
           Maximum number of seconds to wait for the event queue to become empty. The default value is 120 seconds. A value of
           0 will check if the queue is empty and always return immediately.

#3 Updated by Loic Dachary over 3 years ago

  • Backport set to infernalis

#4 Updated by Loic Dachary over 3 years ago

  • Status changed from New to Need Review
  • Assignee set to Loic Dachary

#5 Updated by Loic Dachary over 3 years ago

  • Blocks Bug #13970: ceph-disk list fails on /dev/cciss!c0d0 added

#6 Updated by Loic Dachary over 3 years ago

  • Blocks deleted (Bug #13970: ceph-disk list fails on /dev/cciss!c0d0)

#7 Updated by Loic Dachary over 3 years ago

  • Blocked by Bug #13988: new OSD re-using old OSD id fails to boot added

#8 Updated by Loic Dachary over 3 years ago

  • Subject changed from ceph-disk: zap mail fail on udevadm settle to ceph-disk: may timeout on udevadm settle / partprobe

#9 Updated by Loic Dachary over 3 years ago

Immediately after a disk was zapped, udevadm settle waited for ~2mn30 on an OpenStack instance: this is more than the default 2mn timeout.

2015-12-16 12:06:28,567.567 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/sgdisk --zap-all -- /dev/vdb
2015-12-16 12:06:29,074.074 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Caution: invalid backup GPT header, but valid main header; regenerating
2015-12-16 12:06:29,075.075 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:backup header from main header.
2015-12-16 12:06:29,075.075 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:
2015-12-16 12:06:29,708.708 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
2015-12-16 12:06:29,708.708 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:on the recovery & transformation menu to examine the two tables.
2015-12-16 12:06:29,709.709 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:
2015-12-16 12:06:29,709.709 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Warning! One or more CRCs don't match. You should repair the disk!
2015-12-16 12:06:29,709.709 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:
2015-12-16 12:06:30,854.854 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:****************************************************************************
2015-12-16 12:06:30,855.855 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
2015-12-16 12:06:30,855.855 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:verification and recovery are STRONGLY recommended.
2015-12-16 12:06:30,856.856 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:****************************************************************************
2015-12-16 12:06:30,856.856 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Warning: The kernel is still using the old partition table.
2015-12-16 12:06:30,857.857 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:The new table will be used at the next reboot.
2015-12-16 12:06:30,857.857 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:GPT data structures destroyed! You may now partition the disk using fdisk or
2015-12-16 12:06:30,857.857 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:other utilities.
2015-12-16 12:06:30,858.858 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/sgdisk --clear --mbrtogpt -- /dev/vdb
2015-12-16 12:06:31,877.877 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Creating new GPT entries.
2015-12-16 12:06:31,878.878 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Warning: The kernel is still using the old partition table.
2015-12-16 12:06:31,878.878 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:The new table will be used at the next reboot.
2015-12-16 12:06:31,878.878 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:The operation has completed successfully.
2015-12-16 12:06:31,879.879 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Calling partprobe on zapped device /dev/vdb
2015-12-16 12:06:31,879.879 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
2015-12-16 12:08:56,623.623 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb
2015-12-16 12:08:57,140.140 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600

#10 Updated by Mykola Dvornik over 3 years ago

CentOS Linux 7.1.1503 Core

sdb is a spinning drive (HGST HUS724020ALA640)

solved by putting time.sleep(5) before command_check_call(['partprobe', dev]) in ceph-disk

  8251    [004-s-ragnarok][WARNING] DEBUG:ceph-disk:Creating osd partition on /dev/sdb
  8252    [004-s-ragnarok][WARNING] INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:d92ad27f-caa5-47d7-a2e2-a403eda815ca --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdb
  8253    [004-s-ragnarok][DEBUG ] Creating new GPT entries.
  8254    [004-s-ragnarok][DEBUG ] The operation has completed successfully.
  8255    [004-s-ragnarok][WARNING] DEBUG:ceph-disk:Calling partprobe on created device /dev/sdb
  8256    [004-s-ragnarok][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle
  8257    [004-s-ragnarok][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdb
  8258    [004-s-ragnarok][WARNING] Error: Error informing the kernel about modifications to partition /dev/sdb1 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdb1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
  8259    [004-s-ragnarok][WARNING] Error: Failed to add partition 1 (Device or resource busy)
  8260    [004-s-ragnarok][WARNING] Traceback (most recent call last):
  8261    [004-s-ragnarok][WARNING]   File "/sbin/ceph-disk", line 3576, in <module>
  8262    [004-s-ragnarok][WARNING]     main(sys.argv[1:])
  8263    [004-s-ragnarok][WARNING]   File "/sbin/ceph-disk", line 3530, in main
  8264    [004-s-ragnarok][WARNING]     args.func(args)
  8265    [004-s-ragnarok][WARNING]   File "/sbin/ceph-disk", line 1893, in main_prepare
  8266    [004-s-ragnarok][WARNING]     luks=luks
  8267    [004-s-ragnarok][WARNING]   File "/sbin/ceph-disk", line 1594, in prepare_dev
  8268    [004-s-ragnarok][WARNING]     raise Error(e)
  8269    [004-s-ragnarok][WARNING] __main__.Error: Error: Command '['/sbin/partprobe', '/dev/sdb']' returned non-zero exit status 1
  8270    [004-s-ragnarok][ERROR ] RuntimeError: command returned non-zero exit status: 1
  8271    [ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdb /dev/sda
  8272    [ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs

#11 Updated by Matthew Taylor over 3 years ago

As I had mentioned in ceph-users (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-December/006780.html), I worked around this issue by using partx instead of partprobe:

[root@kvsrv02 ~]# diff -u /usr/sbin/ceph-disk_orig /usr/sbin/ceph-disk
--- /usr/sbin/ceph-disk_orig 2015-12-16 05:05:19.636866273 +0000
+++ /usr/sbin/ceph-disk 2015-12-16 05:27:07.905817825 +0000
@ -35,6 +35,7 @@
import shlex
import pwd
import grp
+from subprocess import call

""" 
Prepare:
@ -1218,7 +1219,8 @
"""
LOG.debug('Calling partprobe on %s device %s', description, dev)
command_check_call(['udevadm', 'settle'])
- command_check_call(['partprobe', dev])
+ call(['partx', '-a', dev])
+ #command_check_call(['partprobe', dev])
command_check_call(['udevadm', 'settle'])
def zap(dev):@

I also tried what Mykola did on a different Ceph node (sleep for 5 before invoking partprobe) without any success:

http://pastebin.com/raw/Q26CeUge

More info if required about the Ceph node that I tested the above on (our cluster all run identical hardware):

@[root@kvsrv03 ~]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@kvsrv03 ~]# uname -r
3.10.0-327.3.1.el7.x86_64
[root@kvsrv03 ~]# ceph --version
ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
[root@kvsrv03 ~]# lsscsi
[0:0:0:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sda
[0:0:1:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdb
[0:0:2:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdc
[0:0:3:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdd
[0:0:4:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sde
[0:0:5:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdf
[0:0:6:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdg
[0:0:7:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdh
[0:0:8:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdi
[0:0:9:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdj
[0:0:10:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdk
[0:0:11:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdl
[0:0:12:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdm
[0:0:13:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdn
[0:0:14:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdo
[0:0:15:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdp
[0:0:16:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdq
[0:0:17:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdr
[0:0:18:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sds
[0:0:19:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdt
[0:0:20:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdu
[0:0:21:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdv
[0:0:22:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdw
[0:0:23:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdx
[0:2:0:0] disk DELL PERC H730P Mini 4.25 /dev/sdy
[root@kvsrv03 ~]# smartctl -i /dev/sdx
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-327.3.1.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: INTEL SSDSC2BG400G4R
Serial Number: BTHC509204U0400VGN
LU WWN Device Id: 5 5cd2e4 04b79b8f8
Add. Product Id: DELL
Firmware Version: G201DL27
User Capacity: 400,088,457,216 bytes [400 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 (unknown minor revision code: 0x006d)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Dec 16 23:08:32 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled@

#12 Updated by Loic Dachary over 3 years ago

@Matthew : using partx is appealing (and that's what ceph-disk used in previous versions of Ceph). The reason partprobe was prefered is that it has a behavior that is easier to predict and debug.

In the past that perceived stability / robustness was a matter of opinion and debate. Now that we have ceph-disk integration tests, we can actually verify by running it over and over if it's true or not. These integration tests are only two month old and it's too early to draw a conclusion. But I'm hopeful that we'll eventually have enough tests involving border cases to try and substitute partprobe with partx to verify it is at least as good.

So, in a nutshell, I'm not saying partx should not be used, nor that partprobe is definitely more stable. But I'm currently working with partprobe to make a more robust ceph-disk that works everywhere.

Regarding the error mentionned in http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-December/006780.html I think the right fix is to increase the udevadm settle timeout and retry partprobe a few times should it give up too soon (because it will eventually succeed).

Would you mind trying https://raw.githubusercontent.com/dachary/ceph/ff273c9654b27b629c3e9e8f06c868ce225eb458/src/ceph-disk to verify it works for you ? I'm in the process of testing it and it would be great to have your feedback (the details of the changes are at https://github.com/ceph/ceph/pull/6926)

#13 Updated by Matthew Taylor over 3 years ago

Loic,

15 seconds sleep (failed): http://pastebin.com/g5J5G5Wq
25 second sleep (failed): http://pastebin.com/mjMfsAFa
35 second sleep (failed): http://pastebin.com/hsrPLYzi
40 second sleep (failed): http://pastebin.com/ShbvQKZG
45 second sleep (successful): http://pastebin.com/PAyADKiJ
Using partx instead of partprobe, no sleep (successful): http://pastebin.com/dasLxxt8

Note: disks were zapped prior to running 'ceph-deploy disk prepare'

I tried just running sgdisk outside of ceph-deploy, which was successful, including running partprobe: http://pastebin.com/VbQii1gS
Note: I had to adjust the partition name as bash was trying to look for a file named 'journal' since there was a space there.

I will try https://raw.githubusercontent.com/dachary/ceph/ff273c9654b27b629c3e9e8f06c868ce225eb458/src/ceph-disk as requested, please give me a few minutes.

#14 Updated by Matthew Taylor over 3 years ago

Tried using https://raw.githubusercontent.com/dachary/ceph/ff273c9654b27b629c3e9e8f06c868ce225eb458/src/ceph-disk (successful):

http://pastebin.com/VkC2wP46

Tested on another node to clarify (also successful):

http://pastebin.com/NbJuesvp

I'll finish preparing another 10 or 15 disks now.

#15 Updated by Matthew Taylor over 3 years ago

I looped through 9 disks on one of our Ceph servers using the ceph-disk script above, 8 failed and 1 completed.

Log is attached!

#16 Updated by Loic Dachary over 3 years ago

I filed http://tracker.ceph.com/issues/14099 so ceph-deploy stops calling partx / partprobe (I noticed from your log that it does).

#17 Updated by Loic Dachary over 3 years ago

@Matthew the log is very useful, thank you. It would be great if you could repeat the same operation without ceph-deploy so that there are no interfering partx -a launched by ceph-deploy while ceph-disk is running. If it turns out to fail as well, it will narrow down the source of errors.

Note that I think partx -a looks like it succeeds because it is non blocking (i.e. it signals the kernel about the partition change but does not wait for a confirmation in the same way partprobe does). In your case (i.e. only adding different disks with colocated journals) it works fine because there is no chance of a race.

#18 Updated by Loic Dachary over 3 years ago

[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:5120M --change-name=2:ceph journal --partition-guid=2:cddb96da-6ab0-49fc-9bdc-16c40fe59e03 --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdq
[kvsrv01][DEBUG ] The operation has completed successfully.
[kvsrv01][WARNING] DEBUG:ceph-disk:Calling partprobe on prepared device /dev/sdq
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq
[kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored)
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq
[kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored)
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[kvsrv01][WARNING] DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/cddb96da-6ab0-49fc-9bdc-16c40fe59e03
[kvsrv01][WARNING] DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/cddb96da-6ab0-49fc-9bdc-16c40fe59e03
[kvsrv01][WARNING] DEBUG:ceph-disk:get_dm_uuid /dev/sdq uuid path is /sys/dev/block/65:0/dm/uuid
[kvsrv01][WARNING] DEBUG:ceph-disk:get_dm_uuid /dev/sdq uuid path is /sys/dev/block/65:0/dm/uuid
[kvsrv01][WARNING] DEBUG:ceph-disk:Creating osd partition on /dev/sdq
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:8261ff6e-545b-42ee-99b1-92ac86bc0af3 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdq
[kvsrv01][DEBUG ] The operation has completed successfully.
[kvsrv01][WARNING] DEBUG:ceph-disk:Calling partprobe on created device /dev/sdq
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq
[kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored)
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq
[kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored)
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq
[kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored)
[kvsrv01][WARNING] Traceback (most recent call last):
[kvsrv01][WARNING]   File "/sbin/ceph-disk", line 4055, in <module>
[kvsrv01][WARNING]     main(sys.argv[1:])
[kvsrv01][WARNING]   File "/sbin/ceph-disk", line 4009, in main
[kvsrv01][WARNING]     args.func(args)
[kvsrv01][WARNING]   File "/sbin/ceph-disk", line 1935, in main_prepare
[kvsrv01][WARNING]     luks=luks
[kvsrv01][WARNING]   File "/sbin/ceph-disk", line 1634, in prepare_dev
[kvsrv01][WARNING]     update_partition(data, 'created')
[kvsrv01][WARNING]   File "/sbin/ceph-disk", line 1262, in update_partition
[kvsrv01][WARNING]     raise Error('partprobe %s failed : %s' % (dev, error))
[kvsrv01][WARNING] __main__.Error: Error: partprobe /dev/sdq failed : unknown error
[kvsrv01][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdq

The above suggests that trying partprobe in a loop three times is not always enough. The first time it works but the second time it fails, presumably because the kernel needs more time to complete the update of the partition table. The report that partx -a succeeds, suggests that.

#19 Updated by Loic Dachary over 3 years ago

@Matthew could you please try with the updated https://raw.githubusercontent.com/dachary/ceph/f4cd0d45be68900c223f779e7d80b85828a27382/src/ceph-disk ? A diff will show you that it differs from the previous one as follows:

  • fix the error message
  • loop 5 times instead of 3
  • sleep 60 seconds on each loop

Thanks !

#20 Updated by Loic Dachary over 3 years ago

In the following we see that partprobe fails immediately the first time. Since it happens right after a udevadm settle on a machine that is not doing anything except run the test, it is unlikely because a udev event is in flight. Waiting 60 seconds resolves the error which suggests an asynchronous event related to kernel partition tables was in progress and finished.

2015-12-17 11:46:12,506.506 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Calling partprobe on zapped device /dev/vdb
2015-12-17 11:46:12,507.507 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
2015-12-17 11:46:15,427.427 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb
2015-12-17 11:46:16,860.860 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:partprobe /dev/vdb failed : Error: Partition(s) 1 on /dev/vdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.
2015-12-17 11:46:16,860.860 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:(ignored, waiting 60s)
2015-12-17 11:47:16,925.925 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
2015-12-17 11:47:19,681.681 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb
2015-12-17 11:47:20,125.125 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600

#21 Updated by Matthew Taylor over 3 years ago

Loic,

I've emailed you directly! :)

Matt.

#22 Updated by Loic Dachary over 3 years ago

  • Related to Bug #14094: ceph-disk list: OSError: [Errno 2] No such file or directory: '/dev/vdb1' added

#23 Updated by Loic Dachary about 3 years ago

  • Status changed from Need Review to Pending Backport

#24 Updated by Loic Dachary about 3 years ago

  • Copied to Backport #14145: infernalis: ceph-disk: use blkid instead of sgdisk -i added

#25 Updated by Loic Dachary about 3 years ago

  • Status changed from Pending Backport to Need Review

#26 Updated by Loic Dachary about 3 years ago

  • Subject changed from ceph-disk: may timeout on udevadm settle / partprobe to ceph-disk: use blkid instead of sgdisk -i

#27 Updated by Loic Dachary about 3 years ago

  • Related to deleted (Bug #14094: ceph-disk list: OSError: [Errno 2] No such file or directory: '/dev/vdb1')

#28 Updated by Loic Dachary about 3 years ago

  • Related to Bug #14094: ceph-disk list: OSError: [Errno 2] No such file or directory: '/dev/vdb1' added

#29 Updated by Loic Dachary about 3 years ago

  • Status changed from Need Review to Pending Backport

#30 Updated by Darrell Enns about 3 years ago

I'm still seeing some failures, even with the 5x 60s delay.

System info:
OS: centOS 7.2.1511
HBA: IBM ServeRAID 8k/8k-l8 (Adaptec AAC-RAID), configured as 1 disk per volume
Ceph: Infernalis (9.2.0-0.el7)

[ceph01][WARNIN] DEBUG:ceph-disk:Calling partprobe on zapped device /dev/sdf
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_btrfs
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_btrfs
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_btrfs
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_btrfs
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type
[ceph01][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdf uuid path is /sys/dev/block/8:80/dm/uuid
[ceph01][WARNIN] INFO:ceph-disk:Will colocate journal with data on /dev/sdf
[ceph01][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdf uuid path is /sys/dev/block/8:80/dm/uuid
[ceph01][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdf uuid path is /sys/dev/block/8:80/dm/uuid
[ceph01][WARNIN] DEBUG:ceph-disk:Creating journal partition num 2 size 5120 on /dev/sdf
[ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:5120M --change-name=2:ceph journal --partition-guid=2:bc22d537-a8e6-4bfb-8bc8-60d24652763a --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdf
[ceph01][DEBUG ] The operation has completed successfully.
[ceph01][WARNIN] DEBUG:ceph-disk:Calling partprobe on prepared device /dev/sdf
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf
[ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdf2 un
til you reboot -- so you shouldn't mount it or use it in any way before rebooting.
[ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy)
[ceph01][WARNIN]  (ignored, waiting 60s)
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf
[ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdf2 un
til you reboot -- so you shouldn't mount it or use it in any way before rebooting.
[ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy)
[ceph01][WARNIN]  (ignored, waiting 60s)
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf
[ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdf2 un
til you reboot -- so you shouldn't mount it or use it in any way before rebooting.
[ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy)
[ceph01][WARNIN]  (ignored, waiting 60s)
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf
[ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdf2 un
til you reboot -- so you shouldn't mount it or use it in any way before rebooting.
[ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy)
[ceph01][WARNIN]  (ignored, waiting 60s)
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf
[ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdf2 un
til you reboot -- so you shouldn't mount it or use it in any way before rebooting.
[ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy)
[ceph01][WARNIN]  (ignored, waiting 60s)
[ceph01][WARNIN] Traceback (most recent call last):
[ceph01][WARNIN]   File "/sbin/ceph-disk", line 4031, in <module>
[ceph01][WARNIN]     main(sys.argv[1:])
[ceph01][WARNIN]   File "/sbin/ceph-disk", line 3985, in main
[ceph01][WARNIN]     args.func(args)
[ceph01][WARNIN]   File "/sbin/ceph-disk", line 1909, in main_prepare
[ceph01][WARNIN]     luks=luks
[ceph01][WARNIN]   File "/sbin/ceph-disk", line 1506, in prepare_journal
[ceph01][WARNIN]     return prepare_journal_dev(data, journal, journal_size, journal_uuid, journal_dm_keypath, cryptsetup_parameters, luks)
[ceph01][WARNIN]   File "/sbin/ceph-disk", line 1421, in prepare_journal_dev
[ceph01][WARNIN]     update_partition(journal, 'prepared')
[ceph01][WARNIN]   File "/sbin/ceph-disk", line 1261, in update_partition
[ceph01][WARNIN]     raise Error('partprobe %s failed : %s' % (dev, error))
[ceph01][WARNIN] __main__.Error: Error: partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/
sdf2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
[ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy)
[ceph01][WARNIN]
[ceph01][ERROR ] RuntimeError: command returned non-zero exit status: 1

#31 Updated by Wes Dillingham about 3 years ago

I am experiencing an identical issue with the updated ceph-disk as provided here:
https://raw.githubusercontent.com/dachary/ceph/f4cd0d45be68900c223f779e7d80b85828a27382/src/ceph-disk

I am running:
CentOS Linux release 7.2.1511 (Core)

Controller:
Dell Perc h700

I am seeing approximately a 30% osd build success rate.

Is the above ceph-disk the latest attempt at resolving this issue or might there be another version?

#32 Updated by Wes Dillingham about 3 years ago

Wes Dillingham wrote:

I am experiencing an identical issue with the updated ceph-disk as provided here:
https://raw.githubusercontent.com/dachary/ceph/f4cd0d45be68900c223f779e7d80b85828a27382/src/ceph-disk

I am running:
CentOS Linux release 7.2.1511 (Core)

Controller:
Dell Perc h700

I am seeing approximately a 30% osd build success rate.

Is the above ceph-disk the latest attempt at resolving this issue or might there be another version?

Edit: I am colocating my journal and data partitions:

ceph-deploy osd create --fs-type xfs ceph-osd06:/dev/sdb

#33 Updated by Wes Dillingham about 3 years ago

I adjusted it to try 20 times instead of 5 (waits 20 minutes) and only got a negligible improvement (1 more osd was deployed out of 36) so I am not sure that this is in fact a timing issue.

I also get the following error in /var/log/ceph/ceph-osd.admin.log each time it iterates over attempting to probe the disk, not sure if related.

2016-02-01 14:46:05.958049 7f0bca77c900 0 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 27389
2016-02-01 14:46:05.958063 7f0bca77c900 -1 unrecognized arg --get-device-fsid

#34 Updated by Loic Dachary about 3 years ago

There are a number of other problems that were fixed recently and that should be backported. I'm not sure how we should do it just yet though. Ideally most of https://github.com/ceph/ceph/pull/6926 and https://github.com/ceph/ceph/pull/5999 should be in infernalis and we should go over each modification to figure out what to backport.

#35 Updated by Loic Dachary about 3 years ago

  • Backport changed from infernalis to infernalis,hammer

#36 Updated by Loic Dachary about 3 years ago

  • Copied to Backport #14602: hammer: ceph-disk: use blkid instead of sgdisk -i added

#37 Updated by Loic Dachary about 3 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF