Bug #15176
closed
partprobe intermittent issues during ceph-disk prepare
Added by Vasu Kulkarni about 8 years ago.
Updated over 7 years ago.
Description
From the mail thread, looks like few folks are having issues with partprobe in ceph-disk and want to use partx instead, I am raising a tracker so that it gets some traction on what should be the right method
Loic's mail thread, looks like ceph-disk used partx before and had some cornercase issues
http://www.spinics.net/lists/ceph-devel/msg26301.html
From: Dan van der Ster <dan@vanderster.com>
Date: Thu, Mar 17, 2016 at 10:47 AM
Subject: Re: [ceph-users] ceph-disk from jewel has issues on redhat 7
To: Vasu Kulkarni <vakulkar@redhat.com>
Cc: Stephen Lord <Steve.Lord@quantum.com>, "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>
Hi,
It's true, partprobe works intermittently. I extracted the key
commands to show the problem:
[18:44]# /usr/sbin/sgdisk --new=2:0:20480M --change-name=2:'ceph
journal' --partition-guid=2:aa23e07d-e6b3-4261-a236-c0565971d88d --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdc The operation has completed successfully. [18:44]# partprobe /dev/sdc
Error: Error informing the kernel about modifications to partition /dev/sdc2 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/sdc2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. Error: Failed to add partition 2 (Device or resource busy) [18:44]# partprobe /dev/sdc
[18:44]# partprobe /dev/sdc
Error: Error informing the kernel about modifications to partition /dev/sdc2 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/sdc2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. Error: Failed to add partition 2 (Device or resource busy) [18:44]# partprobe /dev/sdc
Error: Error informing the kernel about modifications to partition /dev/sdc2 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/sdc2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. Error: Failed to add partition 2 (Device or resource busy) But partx works every time:
[18:46]# /usr/sbin/sgdisk --new=2:0:20480M --change-name=2:'ceph
journal' --partition-guid=2:aa23e07d-e6b3-4261-a236-c0565971d88d --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdd
The operation has completed successfully. [18:46]# partx -u /dev/sdd
[18:46]# partx -u /dev/sdd
[18:46]# partx -u /dev/sdd
[18:46]#
Btw, no additional errors logged on the system.
I found rhbz#1245144, also rhbz#1283112, which are related. We are already running parted-3.1-23.el7.x86_64, which has the supposed fix, but clearly it's still racey, or we're not using partprobe correctly. There is a comment in the bz:
"Note that if you are calling parted multiple times from a script and not checking for device nodes to appear/disappear/whatever you will end up in the same situation. It is best to combine all the commands into a single parted call, or check for the expected changes between the calls."
Maybe we should check for the partition's existence before (unnecessarily) calling partprobe?
I installed parted-3.2-16.fc22.x86_64 on this machine and found it is now 100% reliable:
- rpm -q parted
parted-3.2-16.fc22.x86_64
- /usr/sbin/sgdisk --new=2:0:20480M --change-name=2:'ceph journal' --partition-guid=2:aa23e07d-e6b3-4261-a236-c0565971d88d --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sde
The operation has completed successfully.
- partprobe /dev/sde
- partprobe /dev/sde
- partprobe /dev/sde
- partprobe /dev/sde
- for i in `seq 100`; do partprobe /dev/sde; done
#
I had the same issue with parted-3.1-23.el7.x86_64 (Scientific Linux 7.2). Runs of ceph-disk prepare would fail randomly with the same error from partprobe 'Device or resource busy', and the same error is repeated just by running 'partprobe' with any ceph OSD or Journal partitions in place. There was no consistency to the error. In repeated runs sometimes partprobe would return fine, sometimes not. Presumably the issue is triggered by the udev rules that Ceph (Infernalis) installs in /usr/lib/udev/rules.d/60-ceph-partuuid-workaround.rules
The issue was also resolved for me by installing parted-3.2-16.fc22.x86_64. I've seen no further indication of a problem in running through a batch of 'ceph-disk prepare'.
- Has duplicate Bug #15918: ceph-disk prepare: occasional partprobe failed on CentOS 7/RHEL 7 with parted < 3.2.16 added
- Status changed from New to In Progress
- Priority changed from Normal to Urgent
- Backport set to jewel
- Reviewed source packages patch differences between 3.1-23 and 3.2-18 shows
they both have
- 0033-libparted-Use-read-only-when-probing-devices-on-linu.patch
- 0020-tests-Use-wait_for_dev_to_-functions.patch
but only 3.2-18 has
- 0026-tests-Add-udevadm-settle-to-wait_for_-loop-1260664.patch
- Mailing list thread "parted behavior change between 3.1 and 3.2"
[ubuntu@mira061 ~]$ sudo yum list installed | grep parted
parted.x86_64 3.1-23.el7 @anaconda
[ubuntu@mira061 ~]$ /usr/sbin/sgdisk --new=2:0:100M --change-name=2:'ceph journal' --mbrtogpt -- /dev/sdb
***************************************************************
Found invalid GPT and valid MBR; converting MBR to GPT format.
***************************************************************
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
[ubuntu@mira061 ~]$ sudo partprobe /dev/sdb
[ubuntu@mira061 ~]$ ls -l /dev/sdb*
brw-rw----. 1 root disk 8, 16 May 23 08:21 /dev/sdb
brw-rw----. 1 root disk 8, 18 May 23 08:21 /dev/sdb2
[ubuntu@mira061 ~]$ /usr/sbin/sgdisk --new=3:101M:200M --change-name=3:'ceph journal' --mbrtogpt -- /dev/sdb
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
[ubuntu@mira061 ~]$ ls -l /dev/sdb*
brw-rw----. 1 root disk 8, 16 May 23 08:22 /dev/sdb
brw-rw----. 1 root disk 8, 18 May 23 08:22 /dev/sdb2
brw-rw----. 1 root disk 8, 19 May 23 08:22 /dev/sdb3
[ubuntu@mira061 ~]$ sudo mkfs /dev/sdb2
[ubuntu@mira061 ~]$ sudo mount /dev/sdb2 /mnt
[ubuntu@mira061 ~]$ /usr/sbin/sgdisk --new=4:201M:300M --change-name=4:'ceph journal' --mbrtogpt -- /dev/sdb
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
[ubuntu@mira061 ~]$ ls -l /dev/sdb*
brw-rw----. 1 root disk 8, 16 May 23 08:24 /dev/sdb
brw-rw----. 1 root disk 8, 18 May 23 08:24 /dev/sdb2
brw-rw----. 1 root disk 8, 19 May 23 08:24 /dev/sdb3
[ubuntu@mira061 ~]$ sudo partprobe /dev/sdb
[ubuntu@mira061 ~]$ ls -l /dev/sdb*
brw-rw----. 1 root disk 8, 16 May 23 08:24 /dev/sdb
brw-rw----. 1 root disk 8, 18 May 23 08:24 /dev/sdb2
brw-rw----. 1 root disk 8, 19 May 23 08:24 /dev/sdb3
brw-rw----. 1 root disk 8, 20 May 23 08:24 /dev/sdb4
[ubuntu@mira061 ~]$ uname -a
Linux mira061 4.6.0-rc3-ceph-15705-gac8ec84 #1 SMP Fri May 20 04:17:10 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Same as above with Linux mira061 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
After running (script provided by Daniel H. )
journal_dev="/dev/vdb"
#osd_devs="/dev/vdc /dev/vdd /dev/vde"
osd_devs="/dev/vdc"
date '+== [%H:%M:%S] ======================================================='
for dev in ${osd_devs} ${journal_dev}; do
echo "RUN: sgdisk --zap-all --clear --mbrtogpt -g -- ${dev}"
sgdisk --zap-all --clear --mbrtogpt -g -- ${dev} 2>&1
echo "rcode=$?"
echo
done
udevadm settle
echo
sleep 2
echo "RUN: partprobe"
partprobe 2>&1
echo "rcode=$?"
udevadm settle
for osd_dev in ${osd_devs}; do
echo "RUN: ceph-disk prepare --cluster ceph ${osd_dev} ${journal_dev}"
ceph-disk --verbose prepare --cluster ceph ${osd_dev} ${journal_dev} 2>&1 || exit 1
echo "rcode=$?"
echo
done
echo
On a
# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 7.2 (Maipo)
Release: 7.2
Codename: Maipo
/usr/sbin/partprobe /dev/vdc
always fails with
Error: Error informing the kernel about modifications to partition /dev/vdc1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/vdc1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Failed to add partition 1 (Device or resource busy)
- grep vdc /proc/mounts has nothing
- fuser /dev/vdc /dev/vdc1 has nothing
- lsof -n | grep vdc has nothing
- ls -l /dev/mapper is empty
It doesn't stay busy, does it? It's clearly a race between partprobe
and udev. I'm guessing something like this:
- partprobe triggers a rescan of the partition table in the kernel
- a change uevent is generated by the kernel
- if partprobe gets to the device first (for whatever reason it seems
to open/close it multiple times during "partprobe <dev>"), udev backs
off and partprobe's BLKPG_ADD_PARTITION succeeds
- if udev gets to the device first, we've got a problem...
I haven't worked through the particulars, but I think the above
captures it. It could be that even if udev manages to grab the device
and issue BLKRRPART, but partprobe for whatever reason gets delayed
long enough for udev to finish, things work out in the end too.
Do you want to get the bottom of it? If it's fixed in partprobe 3.2,
what exactly are you after here - a workaround for 3.1?
Do you want to get the bottom of it? If it's fixed in partprobe 3.2,
what exactly are you after here - a workaround for 3.1?
Yes, a workaround is required because upgrading to 3.2 may not be possible on RHEL 7.2 (and other platforms maybe).
It doesn't stay busy, does it?
It stays busy forever.
Are you sure? If you repeat the partprobe again after a couple of seconds or try it a few times at random, at least one of the invocations has got to succeed.
Maybe try to invoke partprobe under strace to mess up with timing? It's got to be a race and a 100% hit rate is very unlikely...
Are you sure? If you repeat the partprobe again after a couple of seconds or try it a few times at random, at least one of the invocations has got to succeed.
You are correct, it does not always fail. I was (un)lucky enough in the past hours to always hit the case when it fails. But running a loop clearly shows that it fails intermittently.
# for i in $(seq 1 9) ; do echo partprobe ; /usr/sbin/partprobe /dev/vdc ; sleep 2 ; done
partprobe
Error: Error informing the kernel about modifications to partition /dev/vdc1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/vdc1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Failed to add partition 1 (Device or resource busy)
partprobe
partprobe
partprobe
Error: Error informing the kernel about modifications to partition /dev/vdc1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/vdc1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Failed to add partition 1 (Device or resource busy)
partprobe
partprobe
partprobe
partprobe
partprobe
- Status changed from In Progress to Fix Under Review
- Status changed from Fix Under Review to Pending Backport
- Copied to Backport #16586: jewel: partprobe intermittent issues during ceph-disk prepare added
- Has duplicate Bug #13984: ceph-disk prepare activates the osd on 7.1 added
- Status changed from Pending Backport to Resolved
Also available in: Atom
PDF