ceph-disk prepare: occasional partprobe failed on CentOS 7/RHEL 7 with parted < 3.2.16
40% on our VMs
Steps to Reproduce:
1. Create and install node for Ceph OSD with at least two spare disks.
2. Run command for disk preparation for a Ceph OSD.
Device /dev/vdb is targeted for journal, /dev/vdc for OSD data. If you have more spare disks, you might try to repeat this command for each "OSD data" device.
- ceph-disk prepare --cluster ceph /dev/vdc /dev/vdb
3. Before trying again, clean up both the journal and OSD data devices:
# sgdisk --zap-all --clear --mbrtogpt
g - /dev/vdb
# sgdisk --zap-all --clear --mbrtogpt g - /dev/vdc
Sometimes the ceph-disk command fails with following (or similar) error: # ceph-disk prepare --cluster ceph /dev/vdc /dev/vdb
prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data
The operation has completed successfully.
ceph-disk: Error: partprobe /dev/vdb failed : Error: Error informing the kernel about modifications to partition /dev/vdb1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/vdb1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Failed to add partition 1 (Device or resource busy) # echo $?
Command ceph-disk should properly prepare the disk for Ceph OSD.
#3 Updated by Loic Dachary about 4 years ago
Thanks David, that's immensely helpful :-) In the parted release notes there is
Avoid generating udev add/remove events for all unmodified partitions
when writing a new table.
which is this commit
But I can't figure out how that would be related to the behavior you had. I don't see anything else in the release notes that would be relevant. It could be that partprobe is not at fault but parted is (ceph-disk uses it as well to scan the partition table). But I don't see a change to parted between 3.1 and 3.2 that could explain the problem.
I'll keep looking a little more to figure out what's going on exactly. Any suggestion / ideas would be most welcome :-)
#4 Updated by Loic Dachary about 4 years ago
- Subject changed from ceph-disk prepare: Error: partprobe /dev/vdb failed : Error: Error informing the kernel about modifications to partition /dev/vdb1 -- Device or resource busy. to ceph-disk prepare: partprobe failed on CentOS 7/RHEL 7 with parted < 3.2.16
#8 Updated by Alfredo Deza about 4 years ago
Most of the problems we've seen in ceph-deploy issues regarding ceph-disk calls have been caused by the async nature of the udev rules.
Have we ensured that this odd behavior in partprobe is not being caused by racing udev rules? What happens when the commands ceph-disk is firing are done in a system that doesn't have Ceph installed (or has it but without any dev rules) ?