Project

General

Profile

Actions

Bug #14080

closed

ceph-disk: use blkid instead of sgdisk -i

Added by Loïc Dachary over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
infernalis,hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On CentOS 7, on OpenStack (udevadm settle may take 45 sec or more). This blocks testing.

DEBUG:ceph-disk:Calling partprobe on zapped device /dev/vdb
INFO:ceph-disk:Running command: /usr/bin/udevadm settle
Traceback (most recent call last):
  File "/sbin/ceph-disk", line 4036, in <module>
    main(sys.argv[1:])
  File "/sbin/ceph-disk", line 3990, in main
    args.func(args)
  File "/sbin/ceph-disk", line 2849, in main_destroy
    zap(base_dev)
  File "/sbin/ceph-disk", line 1289, in zap
    raise Error(e)
__main__.Error: Error: Command '['/usr/bin/udevadm', 'settle']' returned non-zero exit status 1

Files

kvsrv01.log (88 KB) kvsrv01.log zap+prepare loop log. Matthew Taylor, 12/17/2015 12:24 AM

Related issues 4 (0 open4 closed)

Related to Ceph - Bug #14094: ceph-disk list: OSError: [Errno 2] No such file or directory: '/dev/vdb1'DuplicateLoïc Dachary12/16/2015

Actions
Blocked by Ceph - Bug #13988: new OSD re-using old OSD id fails to bootResolvedLoïc Dachary12/05/2015

Actions
Copied to Ceph - Backport #14145: infernalis: ceph-disk: use blkid instead of sgdisk -iResolvedLoïc DacharyActions
Copied to Ceph - Backport #14602: hammer: ceph-disk: use blkid instead of sgdisk -iResolvedLoïc DacharyActions
Actions #1

Updated by Loïc Dachary over 8 years ago

Nothing conclusive on dmesg. After a while (a few minutes at most), it comes back to normal.

Actions #2

Updated by Loïc Dachary over 8 years ago

Could be that it takes more than 120 seconds.

   udevadm settle [options]
       Watches the udev event queue, and exits if all current events are handled.

       --timeout=seconds
           Maximum number of seconds to wait for the event queue to become empty. The default value is 120 seconds. A value of
           0 will check if the queue is empty and always return immediately.

Actions #3

Updated by Loïc Dachary over 8 years ago

  • Backport set to infernalis
Actions #4

Updated by Loïc Dachary over 8 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Loïc Dachary
Actions #5

Updated by Loïc Dachary over 8 years ago

  • Blocks Bug #13970: ceph-disk list fails on /dev/cciss!c0d0 added
Actions #6

Updated by Loïc Dachary over 8 years ago

  • Blocks deleted (Bug #13970: ceph-disk list fails on /dev/cciss!c0d0)
Actions #7

Updated by Loïc Dachary over 8 years ago

  • Blocked by Bug #13988: new OSD re-using old OSD id fails to boot added
Actions #8

Updated by Loïc Dachary over 8 years ago

  • Subject changed from ceph-disk: zap mail fail on udevadm settle to ceph-disk: may timeout on udevadm settle / partprobe
Actions #9

Updated by Loïc Dachary over 8 years ago

Immediately after a disk was zapped, udevadm settle waited for ~2mn30 on an OpenStack instance: this is more than the default 2mn timeout.

2015-12-16 12:06:28,567.567 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/sgdisk --zap-all -- /dev/vdb
2015-12-16 12:06:29,074.074 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Caution: invalid backup GPT header, but valid main header; regenerating
2015-12-16 12:06:29,075.075 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:backup header from main header.
2015-12-16 12:06:29,075.075 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:
2015-12-16 12:06:29,708.708 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
2015-12-16 12:06:29,708.708 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:on the recovery & transformation menu to examine the two tables.
2015-12-16 12:06:29,709.709 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:
2015-12-16 12:06:29,709.709 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Warning! One or more CRCs don't match. You should repair the disk!
2015-12-16 12:06:29,709.709 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:
2015-12-16 12:06:30,854.854 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:****************************************************************************
2015-12-16 12:06:30,855.855 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
2015-12-16 12:06:30,855.855 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:verification and recovery are STRONGLY recommended.
2015-12-16 12:06:30,856.856 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:****************************************************************************
2015-12-16 12:06:30,856.856 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Warning: The kernel is still using the old partition table.
2015-12-16 12:06:30,857.857 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:The new table will be used at the next reboot.
2015-12-16 12:06:30,857.857 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:GPT data structures destroyed! You may now partition the disk using fdisk or
2015-12-16 12:06:30,857.857 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:other utilities.
2015-12-16 12:06:30,858.858 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/sgdisk --clear --mbrtogpt -- /dev/vdb
2015-12-16 12:06:31,877.877 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Creating new GPT entries.
2015-12-16 12:06:31,878.878 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Warning: The kernel is still using the old partition table.
2015-12-16 12:06:31,878.878 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:The new table will be used at the next reboot.
2015-12-16 12:06:31,878.878 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:The operation has completed successfully.
2015-12-16 12:06:31,879.879 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Calling partprobe on zapped device /dev/vdb
2015-12-16 12:06:31,879.879 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
2015-12-16 12:08:56,623.623 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb
2015-12-16 12:08:57,140.140 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
Actions #10

Updated by Mykola Dvornik over 8 years ago

CentOS Linux 7.1.1503 Core

sdb is a spinning drive (HGST HUS724020ALA640)

solved by putting time.sleep(5) before command_check_call(['partprobe', dev]) in ceph-disk

  8251    [004-s-ragnarok][WARNING] DEBUG:ceph-disk:Creating osd partition on /dev/sdb
  8252    [004-s-ragnarok][WARNING] INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:d92ad27f-caa5-47d7-a2e2-a403eda815ca --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdb
  8253    [004-s-ragnarok][DEBUG ] Creating new GPT entries.
  8254    [004-s-ragnarok][DEBUG ] The operation has completed successfully.
  8255    [004-s-ragnarok][WARNING] DEBUG:ceph-disk:Calling partprobe on created device /dev/sdb
  8256    [004-s-ragnarok][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle
  8257    [004-s-ragnarok][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdb
  8258    [004-s-ragnarok][WARNING] Error: Error informing the kernel about modifications to partition /dev/sdb1 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdb1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
  8259    [004-s-ragnarok][WARNING] Error: Failed to add partition 1 (Device or resource busy)
  8260    [004-s-ragnarok][WARNING] Traceback (most recent call last):
  8261    [004-s-ragnarok][WARNING]   File "/sbin/ceph-disk", line 3576, in <module>
  8262    [004-s-ragnarok][WARNING]     main(sys.argv[1:])
  8263    [004-s-ragnarok][WARNING]   File "/sbin/ceph-disk", line 3530, in main
  8264    [004-s-ragnarok][WARNING]     args.func(args)
  8265    [004-s-ragnarok][WARNING]   File "/sbin/ceph-disk", line 1893, in main_prepare
  8266    [004-s-ragnarok][WARNING]     luks=luks
  8267    [004-s-ragnarok][WARNING]   File "/sbin/ceph-disk", line 1594, in prepare_dev
  8268    [004-s-ragnarok][WARNING]     raise Error(e)
  8269    [004-s-ragnarok][WARNING] __main__.Error: Error: Command '['/sbin/partprobe', '/dev/sdb']' returned non-zero exit status 1
  8270    [004-s-ragnarok][ERROR ] RuntimeError: command returned non-zero exit status: 1
  8271    [ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdb /dev/sda
  8272    [ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
Actions #11

Updated by Matthew Taylor over 8 years ago

As I had mentioned in ceph-users (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-December/006780.html), I worked around this issue by using partx instead of partprobe:

[root@kvsrv02 ~]# diff -u /usr/sbin/ceph-disk_orig /usr/sbin/ceph-disk
--- /usr/sbin/ceph-disk_orig 2015-12-16 05:05:19.636866273 +0000
+++ /usr/sbin/ceph-disk 2015-12-16 05:27:07.905817825 +0000
@ -35,6 +35,7 @@
import shlex
import pwd
import grp
+from subprocess import call

""" 
Prepare:
@ -1218,7 +1219,8 @
"""
LOG.debug('Calling partprobe on %s device %s', description, dev)
command_check_call(['udevadm', 'settle'])
- command_check_call(['partprobe', dev])
+ call(['partx', '-a', dev])
+ #command_check_call(['partprobe', dev])
command_check_call(['udevadm', 'settle'])
def zap(dev):@

I also tried what Mykola did on a different Ceph node (sleep for 5 before invoking partprobe) without any success:

http://pastebin.com/raw/Q26CeUge

More info if required about the Ceph node that I tested the above on (our cluster all run identical hardware):

@[root@kvsrv03 ~]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@kvsrv03 ~]# uname -r
3.10.0-327.3.1.el7.x86_64
[root@kvsrv03 ~]# ceph --version
ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
[root@kvsrv03 ~]# lsscsi
[0:0:0:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sda
[0:0:1:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdb
[0:0:2:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdc
[0:0:3:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdd
[0:0:4:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sde
[0:0:5:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdf
[0:0:6:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdg
[0:0:7:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdh
[0:0:8:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdi
[0:0:9:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdj
[0:0:10:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdk
[0:0:11:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdl
[0:0:12:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdm
[0:0:13:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdn
[0:0:14:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdo
[0:0:15:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdp
[0:0:16:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdq
[0:0:17:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdr
[0:0:18:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sds
[0:0:19:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdt
[0:0:20:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdu
[0:0:21:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdv
[0:0:22:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdw
[0:0:23:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdx
[0:2:0:0] disk DELL PERC H730P Mini 4.25 /dev/sdy
[root@kvsrv03 ~]# smartctl -i /dev/sdx
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-327.3.1.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: INTEL SSDSC2BG400G4R
Serial Number: BTHC509204U0400VGN
LU WWN Device Id: 5 5cd2e4 04b79b8f8
Add. Product Id: DELL
Firmware Version: G201DL27
User Capacity: 400,088,457,216 bytes [400 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 (unknown minor revision code: 0x006d)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Dec 16 23:08:32 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled@

Actions #12

Updated by Loïc Dachary over 8 years ago

@Matthew : using partx is appealing (and that's what ceph-disk used in previous versions of Ceph). The reason partprobe was prefered is that it has a behavior that is easier to predict and debug.

In the past that perceived stability / robustness was a matter of opinion and debate. Now that we have ceph-disk integration tests, we can actually verify by running it over and over if it's true or not. These integration tests are only two month old and it's too early to draw a conclusion. But I'm hopeful that we'll eventually have enough tests involving border cases to try and substitute partprobe with partx to verify it is at least as good.

So, in a nutshell, I'm not saying partx should not be used, nor that partprobe is definitely more stable. But I'm currently working with partprobe to make a more robust ceph-disk that works everywhere.

Regarding the error mentionned in http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-December/006780.html I think the right fix is to increase the udevadm settle timeout and retry partprobe a few times should it give up too soon (because it will eventually succeed).

Would you mind trying https://raw.githubusercontent.com/dachary/ceph/ff273c9654b27b629c3e9e8f06c868ce225eb458/src/ceph-disk to verify it works for you ? I'm in the process of testing it and it would be great to have your feedback (the details of the changes are at https://github.com/ceph/ceph/pull/6926)

Actions #13

Updated by Matthew Taylor over 8 years ago

Loic,

15 seconds sleep (failed): http://pastebin.com/g5J5G5Wq
25 second sleep (failed): http://pastebin.com/mjMfsAFa
35 second sleep (failed): http://pastebin.com/hsrPLYzi
40 second sleep (failed): http://pastebin.com/ShbvQKZG
45 second sleep (successful): http://pastebin.com/PAyADKiJ
Using partx instead of partprobe, no sleep (successful): http://pastebin.com/dasLxxt8

Note: disks were zapped prior to running 'ceph-deploy disk prepare'

I tried just running sgdisk outside of ceph-deploy, which was successful, including running partprobe: http://pastebin.com/VbQii1gS
Note: I had to adjust the partition name as bash was trying to look for a file named 'journal' since there was a space there.

I will try https://raw.githubusercontent.com/dachary/ceph/ff273c9654b27b629c3e9e8f06c868ce225eb458/src/ceph-disk as requested, please give me a few minutes.

Actions #14

Updated by Matthew Taylor over 8 years ago

Tried using https://raw.githubusercontent.com/dachary/ceph/ff273c9654b27b629c3e9e8f06c868ce225eb458/src/ceph-disk (successful):

http://pastebin.com/VkC2wP46

Tested on another node to clarify (also successful):

http://pastebin.com/NbJuesvp

I'll finish preparing another 10 or 15 disks now.

Actions #15

Updated by Matthew Taylor over 8 years ago

I looped through 9 disks on one of our Ceph servers using the ceph-disk script above, 8 failed and 1 completed.

Log is attached!

Actions #16

Updated by Loïc Dachary over 8 years ago

I filed http://tracker.ceph.com/issues/14099 so ceph-deploy stops calling partx / partprobe (I noticed from your log that it does).

Actions #17

Updated by Loïc Dachary over 8 years ago

@Matthew the log is very useful, thank you. It would be great if you could repeat the same operation without ceph-deploy so that there are no interfering partx -a launched by ceph-deploy while ceph-disk is running. If it turns out to fail as well, it will narrow down the source of errors.

Note that I think partx -a looks like it succeeds because it is non blocking (i.e. it signals the kernel about the partition change but does not wait for a confirmation in the same way partprobe does). In your case (i.e. only adding different disks with colocated journals) it works fine because there is no chance of a race.

Actions #18

Updated by Loïc Dachary over 8 years ago

[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:5120M --change-name=2:ceph journal --partition-guid=2:cddb96da-6ab0-49fc-9bdc-16c40fe59e03 --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdq
[kvsrv01][DEBUG ] The operation has completed successfully.
[kvsrv01][WARNING] DEBUG:ceph-disk:Calling partprobe on prepared device /dev/sdq
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq
[kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored)
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq
[kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored)
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[kvsrv01][WARNING] DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/cddb96da-6ab0-49fc-9bdc-16c40fe59e03
[kvsrv01][WARNING] DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/cddb96da-6ab0-49fc-9bdc-16c40fe59e03
[kvsrv01][WARNING] DEBUG:ceph-disk:get_dm_uuid /dev/sdq uuid path is /sys/dev/block/65:0/dm/uuid
[kvsrv01][WARNING] DEBUG:ceph-disk:get_dm_uuid /dev/sdq uuid path is /sys/dev/block/65:0/dm/uuid
[kvsrv01][WARNING] DEBUG:ceph-disk:Creating osd partition on /dev/sdq
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:8261ff6e-545b-42ee-99b1-92ac86bc0af3 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdq
[kvsrv01][DEBUG ] The operation has completed successfully.
[kvsrv01][WARNING] DEBUG:ceph-disk:Calling partprobe on created device /dev/sdq
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq
[kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored)
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq
[kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored)
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq
[kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored)
[kvsrv01][WARNING] Traceback (most recent call last):
[kvsrv01][WARNING]   File "/sbin/ceph-disk", line 4055, in <module>
[kvsrv01][WARNING]     main(sys.argv[1:])
[kvsrv01][WARNING]   File "/sbin/ceph-disk", line 4009, in main
[kvsrv01][WARNING]     args.func(args)
[kvsrv01][WARNING]   File "/sbin/ceph-disk", line 1935, in main_prepare
[kvsrv01][WARNING]     luks=luks
[kvsrv01][WARNING]   File "/sbin/ceph-disk", line 1634, in prepare_dev
[kvsrv01][WARNING]     update_partition(data, 'created')
[kvsrv01][WARNING]   File "/sbin/ceph-disk", line 1262, in update_partition
[kvsrv01][WARNING]     raise Error('partprobe %s failed : %s' % (dev, error))
[kvsrv01][WARNING] __main__.Error: Error: partprobe /dev/sdq failed : unknown error
[kvsrv01][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdq

The above suggests that trying partprobe in a loop three times is not always enough. The first time it works but the second time it fails, presumably because the kernel needs more time to complete the update of the partition table. The report that partx -a succeeds, suggests that.

Actions #19

Updated by Loïc Dachary over 8 years ago

@Matthew could you please try with the updated https://raw.githubusercontent.com/dachary/ceph/f4cd0d45be68900c223f779e7d80b85828a27382/src/ceph-disk ? A diff will show you that it differs from the previous one as follows:

  • fix the error message
  • loop 5 times instead of 3
  • sleep 60 seconds on each loop

Thanks !

Actions #20

Updated by Loïc Dachary over 8 years ago

In the following we see that partprobe fails immediately the first time. Since it happens right after a udevadm settle on a machine that is not doing anything except run the test, it is unlikely because a udev event is in flight. Waiting 60 seconds resolves the error which suggests an asynchronous event related to kernel partition tables was in progress and finished.

2015-12-17 11:46:12,506.506 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Calling partprobe on zapped device /dev/vdb
2015-12-17 11:46:12,507.507 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
2015-12-17 11:46:15,427.427 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb
2015-12-17 11:46:16,860.860 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:partprobe /dev/vdb failed : Error: Partition(s) 1 on /dev/vdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.
2015-12-17 11:46:16,860.860 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:(ignored, waiting 60s)
2015-12-17 11:47:16,925.925 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
2015-12-17 11:47:19,681.681 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb
2015-12-17 11:47:20,125.125 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
Actions #21

Updated by Matthew Taylor over 8 years ago

Loic,

I've emailed you directly! :)

Matt.

Actions #22

Updated by Loïc Dachary over 8 years ago

  • Related to Bug #14094: ceph-disk list: OSError: [Errno 2] No such file or directory: '/dev/vdb1' added
Actions #23

Updated by Loïc Dachary over 8 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #24

Updated by Loïc Dachary over 8 years ago

  • Copied to Backport #14145: infernalis: ceph-disk: use blkid instead of sgdisk -i added
Actions #25

Updated by Loïc Dachary over 8 years ago

  • Status changed from Pending Backport to Fix Under Review
Actions #26

Updated by Loïc Dachary over 8 years ago

  • Subject changed from ceph-disk: may timeout on udevadm settle / partprobe to ceph-disk: use blkid instead of sgdisk -i
Actions #27

Updated by Loïc Dachary over 8 years ago

  • Related to deleted (Bug #14094: ceph-disk list: OSError: [Errno 2] No such file or directory: '/dev/vdb1')
Actions #28

Updated by Loïc Dachary over 8 years ago

  • Related to Bug #14094: ceph-disk list: OSError: [Errno 2] No such file or directory: '/dev/vdb1' added
Actions #29

Updated by Loïc Dachary over 8 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #30

Updated by Darrell Enns over 8 years ago

I'm still seeing some failures, even with the 5x 60s delay.

System info:
OS: centOS 7.2.1511
HBA: IBM ServeRAID 8k/8k-l8 (Adaptec AAC-RAID), configured as 1 disk per volume
Ceph: Infernalis (9.2.0-0.el7)

[ceph01][WARNIN] DEBUG:ceph-disk:Calling partprobe on zapped device /dev/sdf
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_btrfs
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_btrfs
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_btrfs
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_btrfs
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type
[ceph01][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdf uuid path is /sys/dev/block/8:80/dm/uuid
[ceph01][WARNIN] INFO:ceph-disk:Will colocate journal with data on /dev/sdf
[ceph01][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdf uuid path is /sys/dev/block/8:80/dm/uuid
[ceph01][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdf uuid path is /sys/dev/block/8:80/dm/uuid
[ceph01][WARNIN] DEBUG:ceph-disk:Creating journal partition num 2 size 5120 on /dev/sdf
[ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:5120M --change-name=2:ceph journal --partition-guid=2:bc22d537-a8e6-4bfb-8bc8-60d24652763a --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdf
[ceph01][DEBUG ] The operation has completed successfully.
[ceph01][WARNIN] DEBUG:ceph-disk:Calling partprobe on prepared device /dev/sdf
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf
[ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdf2 un
til you reboot -- so you shouldn't mount it or use it in any way before rebooting.
[ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy)
[ceph01][WARNIN]  (ignored, waiting 60s)
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf
[ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdf2 un
til you reboot -- so you shouldn't mount it or use it in any way before rebooting.
[ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy)
[ceph01][WARNIN]  (ignored, waiting 60s)
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf
[ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdf2 un
til you reboot -- so you shouldn't mount it or use it in any way before rebooting.
[ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy)
[ceph01][WARNIN]  (ignored, waiting 60s)
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf
[ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdf2 un
til you reboot -- so you shouldn't mount it or use it in any way before rebooting.
[ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy)
[ceph01][WARNIN]  (ignored, waiting 60s)
[ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
[ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf
[ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdf2 un
til you reboot -- so you shouldn't mount it or use it in any way before rebooting.
[ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy)
[ceph01][WARNIN]  (ignored, waiting 60s)
[ceph01][WARNIN] Traceback (most recent call last):
[ceph01][WARNIN]   File "/sbin/ceph-disk", line 4031, in <module>
[ceph01][WARNIN]     main(sys.argv[1:])
[ceph01][WARNIN]   File "/sbin/ceph-disk", line 3985, in main
[ceph01][WARNIN]     args.func(args)
[ceph01][WARNIN]   File "/sbin/ceph-disk", line 1909, in main_prepare
[ceph01][WARNIN]     luks=luks
[ceph01][WARNIN]   File "/sbin/ceph-disk", line 1506, in prepare_journal
[ceph01][WARNIN]     return prepare_journal_dev(data, journal, journal_size, journal_uuid, journal_dm_keypath, cryptsetup_parameters, luks)
[ceph01][WARNIN]   File "/sbin/ceph-disk", line 1421, in prepare_journal_dev
[ceph01][WARNIN]     update_partition(journal, 'prepared')
[ceph01][WARNIN]   File "/sbin/ceph-disk", line 1261, in update_partition
[ceph01][WARNIN]     raise Error('partprobe %s failed : %s' % (dev, error))
[ceph01][WARNIN] __main__.Error: Error: partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/
sdf2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
[ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy)
[ceph01][WARNIN]
[ceph01][ERROR ] RuntimeError: command returned non-zero exit status: 1
Actions #31

Updated by Wes Dillingham about 8 years ago

I am experiencing an identical issue with the updated ceph-disk as provided here:
https://raw.githubusercontent.com/dachary/ceph/f4cd0d45be68900c223f779e7d80b85828a27382/src/ceph-disk

I am running:
CentOS Linux release 7.2.1511 (Core)

Controller:
Dell Perc h700

I am seeing approximately a 30% osd build success rate.

Is the above ceph-disk the latest attempt at resolving this issue or might there be another version?

Actions #32

Updated by Wes Dillingham about 8 years ago

Wes Dillingham wrote:

I am experiencing an identical issue with the updated ceph-disk as provided here:
https://raw.githubusercontent.com/dachary/ceph/f4cd0d45be68900c223f779e7d80b85828a27382/src/ceph-disk

I am running:
CentOS Linux release 7.2.1511 (Core)

Controller:
Dell Perc h700

I am seeing approximately a 30% osd build success rate.

Is the above ceph-disk the latest attempt at resolving this issue or might there be another version?

Edit: I am colocating my journal and data partitions:

ceph-deploy osd create --fs-type xfs ceph-osd06:/dev/sdb

Actions #33

Updated by Wes Dillingham about 8 years ago

I adjusted it to try 20 times instead of 5 (waits 20 minutes) and only got a negligible improvement (1 more osd was deployed out of 36) so I am not sure that this is in fact a timing issue.

I also get the following error in /var/log/ceph/ceph-osd.admin.log each time it iterates over attempting to probe the disk, not sure if related.

2016-02-01 14:46:05.958049 7f0bca77c900 0 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 27389
2016-02-01 14:46:05.958063 7f0bca77c900 -1 unrecognized arg --get-device-fsid

Actions #34

Updated by Loïc Dachary about 8 years ago

There are a number of other problems that were fixed recently and that should be backported. I'm not sure how we should do it just yet though. Ideally most of https://github.com/ceph/ceph/pull/6926 and https://github.com/ceph/ceph/pull/5999 should be in infernalis and we should go over each modification to figure out what to backport.

Actions #35

Updated by Loïc Dachary about 8 years ago

  • Backport changed from infernalis to infernalis,hammer
Actions #36

Updated by Loïc Dachary about 8 years ago

  • Copied to Backport #14602: hammer: ceph-disk: use blkid instead of sgdisk -i added
Actions #37

Updated by Loïc Dachary about 8 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF