Bug #15176
partprobe intermittent issues during ceph-disk prepare
0%
Description
From the mail thread, looks like few folks are having issues with partprobe in ceph-disk and want to use partx instead, I am raising a tracker so that it gets some traction on what should be the right method
Loic's mail thread, looks like ceph-disk used partx before and had some cornercase issues
http://www.spinics.net/lists/ceph-devel/msg26301.html
From: Dan van der Ster <dan@vanderster.com> Date: Thu, Mar 17, 2016 at 10:47 AM Subject: Re: [ceph-users] ceph-disk from jewel has issues on redhat 7 To: Vasu Kulkarni <vakulkar@redhat.com> Cc: Stephen Lord <Steve.Lord@quantum.com>, "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com> Hi, It's true, partprobe works intermittently. I extracted the key commands to show the problem: [18:44]# /usr/sbin/sgdisk --new=2:0:20480M --change-name=2:'ceph journal' --partition-guid=2:aa23e07d-e6b3-4261-a236-c0565971d88d --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdc The operation has completed successfully. [18:44]# partprobe /dev/sdc Error: Error informing the kernel about modifications to partition /dev/sdc2 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/sdc2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. Error: Failed to add partition 2 (Device or resource busy) [18:44]# partprobe /dev/sdc [18:44]# partprobe /dev/sdc Error: Error informing the kernel about modifications to partition /dev/sdc2 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/sdc2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. Error: Failed to add partition 2 (Device or resource busy) [18:44]# partprobe /dev/sdc Error: Error informing the kernel about modifications to partition /dev/sdc2 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/sdc2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. Error: Failed to add partition 2 (Device or resource busy) But partx works every time: [18:46]# /usr/sbin/sgdisk --new=2:0:20480M --change-name=2:'ceph journal' --partition-guid=2:aa23e07d-e6b3-4261-a236-c0565971d88d --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdd The operation has completed successfully. [18:46]# partx -u /dev/sdd [18:46]# partx -u /dev/sdd [18:46]# partx -u /dev/sdd [18:46]#
Related issues
History
#1 Updated by Dan van der Ster about 8 years ago
Btw, no additional errors logged on the system.
#2 Updated by Dan van der Ster about 8 years ago
I found rhbz#1245144, also rhbz#1283112, which are related. We are already running parted-3.1-23.el7.x86_64, which has the supposed fix, but clearly it's still racey, or we're not using partprobe correctly. There is a comment in the bz:
"Note that if you are calling parted multiple times from a script and not checking for device nodes to appear/disappear/whatever you will end up in the same situation. It is best to combine all the commands into a single parted call, or check for the expected changes between the calls."
Maybe we should check for the partition's existence before (unnecessarily) calling partprobe?
#3 Updated by Dan van der Ster about 8 years ago
I installed parted-3.2-16.fc22.x86_64 on this machine and found it is now 100% reliable:
- rpm -q parted
parted-3.2-16.fc22.x86_64
- /usr/sbin/sgdisk --new=2:0:20480M --change-name=2:'ceph journal' --partition-guid=2:aa23e07d-e6b3-4261-a236-c0565971d88d --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sde
The operation has completed successfully. - partprobe /dev/sde
- partprobe /dev/sde
- partprobe /dev/sde
- partprobe /dev/sde
- for i in `seq 100`; do partprobe /dev/sde; done #
#4 Updated by Ben Meekhof almost 8 years ago
I had the same issue with parted-3.1-23.el7.x86_64 (Scientific Linux 7.2). Runs of ceph-disk prepare would fail randomly with the same error from partprobe 'Device or resource busy', and the same error is repeated just by running 'partprobe' with any ceph OSD or Journal partitions in place. There was no consistency to the error. In repeated runs sometimes partprobe would return fine, sometimes not. Presumably the issue is triggered by the udev rules that Ceph (Infernalis) installs in /usr/lib/udev/rules.d/60-ceph-partuuid-workaround.rules
The issue was also resolved for me by installing parted-3.2-16.fc22.x86_64. I've seen no further indication of a problem in running through a batch of 'ceph-disk prepare'.
#5 Updated by Loïc Dachary almost 8 years ago
- Duplicated by Bug #15918: ceph-disk prepare: occasional partprobe failed on CentOS 7/RHEL 7 with parted < 3.2.16 added
#6 Updated by Loïc Dachary almost 8 years ago
- Status changed from New to In Progress
- Priority changed from Normal to Urgent
- Backport set to jewel
#7 Updated by Loïc Dachary almost 8 years ago
- http://ftp.scientificlinux.org/linux/scientific/7.2/SRPMS/vendor/parted-3.1-23.el7.src.rpm
- https://dl.fedoraproject.org/pub/fedora/linux/updates/22/SRPMS/p/parted-3.2-10.fc22.src.rpm
- https://dl.fedoraproject.org/pub/fedora/linux/updates/23/SRPMS/p/parted-3.2-18.fc23.src.rpm
- Reviewed source packages patch differences between 3.1-23 and 3.2-18 shows
they both have- 0033-libparted-Use-read-only-when-probing-devices-on-linu.patch
- 0020-tests-Use-wait_for_dev_to_-functions.patch
but only 3.2-18 has - 0026-tests-Add-udevadm-settle-to-wait_for_-loop-1260664.patch
- The biggest related change is the algorithm used to modify, add, and remove partitions with a number of followups (as noted by Brian C. Lane).
- Mailing list thread "parted behavior change between 3.1 and 3.2"
[ubuntu@mira061 ~]$ sudo yum list installed | grep parted parted.x86_64 3.1-23.el7 @anaconda [ubuntu@mira061 ~]$ /usr/sbin/sgdisk --new=2:0:100M --change-name=2:'ceph journal' --mbrtogpt -- /dev/sdb *************************************************************** Found invalid GPT and valid MBR; converting MBR to GPT format. *************************************************************** Warning: The kernel is still using the old partition table. The new table will be used at the next reboot. The operation has completed successfully. [ubuntu@mira061 ~]$ sudo partprobe /dev/sdb [ubuntu@mira061 ~]$ ls -l /dev/sdb* brw-rw----. 1 root disk 8, 16 May 23 08:21 /dev/sdb brw-rw----. 1 root disk 8, 18 May 23 08:21 /dev/sdb2 [ubuntu@mira061 ~]$ /usr/sbin/sgdisk --new=3:101M:200M --change-name=3:'ceph journal' --mbrtogpt -- /dev/sdb Warning: The kernel is still using the old partition table. The new table will be used at the next reboot. The operation has completed successfully. [ubuntu@mira061 ~]$ ls -l /dev/sdb* brw-rw----. 1 root disk 8, 16 May 23 08:22 /dev/sdb brw-rw----. 1 root disk 8, 18 May 23 08:22 /dev/sdb2 brw-rw----. 1 root disk 8, 19 May 23 08:22 /dev/sdb3 [ubuntu@mira061 ~]$ sudo mkfs /dev/sdb2 [ubuntu@mira061 ~]$ sudo mount /dev/sdb2 /mnt [ubuntu@mira061 ~]$ /usr/sbin/sgdisk --new=4:201M:300M --change-name=4:'ceph journal' --mbrtogpt -- /dev/sdb Warning: The kernel is still using the old partition table. The new table will be used at the next reboot. The operation has completed successfully. [ubuntu@mira061 ~]$ ls -l /dev/sdb* brw-rw----. 1 root disk 8, 16 May 23 08:24 /dev/sdb brw-rw----. 1 root disk 8, 18 May 23 08:24 /dev/sdb2 brw-rw----. 1 root disk 8, 19 May 23 08:24 /dev/sdb3 [ubuntu@mira061 ~]$ sudo partprobe /dev/sdb [ubuntu@mira061 ~]$ ls -l /dev/sdb* brw-rw----. 1 root disk 8, 16 May 23 08:24 /dev/sdb brw-rw----. 1 root disk 8, 18 May 23 08:24 /dev/sdb2 brw-rw----. 1 root disk 8, 19 May 23 08:24 /dev/sdb3 brw-rw----. 1 root disk 8, 20 May 23 08:24 /dev/sdb4 [ubuntu@mira061 ~]$ uname -a Linux mira061 4.6.0-rc3-ceph-15705-gac8ec84 #1 SMP Fri May 20 04:17:10 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Same as above with Linux mira061 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
#8 Updated by Loïc Dachary almost 8 years ago
After running (script provided by Daniel H. )
journal_dev="/dev/vdb" #osd_devs="/dev/vdc /dev/vdd /dev/vde" osd_devs="/dev/vdc" date '+== [%H:%M:%S] =======================================================' for dev in ${osd_devs} ${journal_dev}; do echo "RUN: sgdisk --zap-all --clear --mbrtogpt -g -- ${dev}" sgdisk --zap-all --clear --mbrtogpt -g -- ${dev} 2>&1 echo "rcode=$?" echo done udevadm settle echo sleep 2 echo "RUN: partprobe" partprobe 2>&1 echo "rcode=$?" udevadm settle for osd_dev in ${osd_devs}; do echo "RUN: ceph-disk prepare --cluster ceph ${osd_dev} ${journal_dev}" ceph-disk --verbose prepare --cluster ceph ${osd_dev} ${journal_dev} 2>&1 || exit 1 echo "rcode=$?" echo done echo
On a
# lsb_release -a LSB Version: :core-4.1-amd64:core-4.1-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 7.2 (Maipo) Release: 7.2 Codename: Maipo
/usr/sbin/partprobe /dev/vdc
always fails with
Error: Error informing the kernel about modifications to partition /dev/vdc1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/vdc1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. Error: Failed to add partition 1 (Device or resource busy)
- grep vdc /proc/mounts has nothing
- fuser /dev/vdc /dev/vdc1 has nothing
- lsof -n | grep vdc has nothing
- ls -l /dev/mapper is empty
#9 Updated by Ilya Dryomov almost 8 years ago
It doesn't stay busy, does it? It's clearly a race between partprobe
and udev. I'm guessing something like this:
- partprobe triggers a rescan of the partition table in the kernel
- a change uevent is generated by the kernel
- if partprobe gets to the device first (for whatever reason it seems
to open/close it multiple times during "partprobe <dev>"), udev backs
off and partprobe's BLKPG_ADD_PARTITION succeeds
- if udev gets to the device first, we've got a problem...
I haven't worked through the particulars, but I think the above
captures it. It could be that even if udev manages to grab the device
and issue BLKRRPART, but partprobe for whatever reason gets delayed
long enough for udev to finish, things work out in the end too.
Do you want to get the bottom of it? If it's fixed in partprobe 3.2,
what exactly are you after here - a workaround for 3.1?
#10 Updated by Loïc Dachary almost 8 years ago
Do you want to get the bottom of it? If it's fixed in partprobe 3.2,
what exactly are you after here - a workaround for 3.1?
Yes, a workaround is required because upgrading to 3.2 may not be possible on RHEL 7.2 (and other platforms maybe).
#11 Updated by Loïc Dachary almost 8 years ago
It doesn't stay busy, does it?
It stays busy forever.
#12 Updated by Ilya Dryomov almost 8 years ago
Are you sure? If you repeat the partprobe again after a couple of seconds or try it a few times at random, at least one of the invocations has got to succeed.
Maybe try to invoke partprobe under strace to mess up with timing? It's got to be a race and a 100% hit rate is very unlikely...
#13 Updated by Loïc Dachary almost 8 years ago
Are you sure? If you repeat the partprobe again after a couple of seconds or try it a few times at random, at least one of the invocations has got to succeed.
You are correct, it does not always fail. I was (un)lucky enough in the past hours to always hit the case when it fails. But running a loop clearly shows that it fails intermittently.
# for i in $(seq 1 9) ; do echo partprobe ; /usr/sbin/partprobe /dev/vdc ; sleep 2 ; done partprobe Error: Error informing the kernel about modifications to partition /dev/vdc1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/vdc1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. Error: Failed to add partition 1 (Device or resource busy) partprobe partprobe partprobe Error: Error informing the kernel about modifications to partition /dev/vdc1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/vdc1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. Error: Failed to add partition 1 (Device or resource busy) partprobe partprobe partprobe partprobe partprobe
#14 Updated by Loïc Dachary almost 8 years ago
- Status changed from In Progress to Fix Under Review
#15 Updated by Kefu Chai over 7 years ago
- Status changed from Fix Under Review to Pending Backport
#16 Updated by Nathan Cutler over 7 years ago
- Copied to Backport #16586: jewel: partprobe intermittent issues during ceph-disk prepare added
#17 Updated by Loïc Dachary over 7 years ago
- Duplicated by Bug #13984: ceph-disk prepare activates the osd on 7.1 added
#18 Updated by Loïc Dachary over 7 years ago
- Status changed from Pending Backport to Resolved