Bug #15918
ceph-disk prepare: occasional partprobe failed on CentOS 7/RHEL 7 with parted < 3.2.16
0%
Description
How reproducible:
40% on our VMs
Steps to Reproduce:
1. Create and install node for Ceph OSD with at least two spare disks.
2. Run command for disk preparation for a Ceph OSD.
Device /dev/vdb is targeted for journal, /dev/vdc for OSD data. If you have more spare disks, you might try to repeat this command for each "OSD data" device.
- ceph-disk prepare --cluster ceph /dev/vdc /dev/vdb
3. Before trying again, clean up both the journal and OSD data devices:
# sgdisk --zap-all --clear --mbrtogpt g - /dev/vdb
# sgdisk --zap-all --clear --mbrtogpt g - /dev/vdc
Actual results:
Sometimes the ceph-disk command fails with following (or similar) error:
# ceph-disk prepare --cluster ceph /dev/vdc /dev/vdb
prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data
The operation has completed successfully.
ceph-disk: Error: partprobe /dev/vdb failed : Error: Error informing the kernel about modifications to partition /dev/vdb1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/vdb1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Failed to add partition 1 (Device or resource busy)
# echo $?
1
Expected results:
Command ceph-disk should properly prepare the disk for Ceph OSD.
Related issues
History
#1 Updated by David Orman almost 8 years ago
If you're using CentOS 7/RHEL 7, try upgrading to parted-3.2-16.fc22. This fixed the issue (I encountered the same problem) for me (I believe I rebooted since I had already tried multiple times with the system with the older version).
#2 Updated by Daniel Horak almost 8 years ago
Hi David,
big thanks for your suggestion.
Yes it is on RHEL 7 and I can confirm, that parted-3.2-16.fc22 fix the issue.
Originally with parted-3.1-23.el7.x86_64 it was failing.
#3 Updated by Loïc Dachary almost 8 years ago
Thanks David, that's immensely helpful :-) In the parted release notes there is
Avoid generating udev add/remove events for all unmodified partitions
when writing a new table.
which is this commit
But I can't figure out how that would be related to the behavior you had. I don't see anything else in the release notes that would be relevant. It could be that partprobe is not at fault but parted is (ceph-disk uses it as well to scan the partition table). But I don't see a change to parted between 3.1 and 3.2 that could explain the problem.
I'll keep looking a little more to figure out what's going on exactly. Any suggestion / ideas would be most welcome :-)
#4 Updated by Loïc Dachary almost 8 years ago
- Subject changed from ceph-disk prepare: Error: partprobe /dev/vdb failed : Error: Error informing the kernel about modifications to partition /dev/vdb1 -- Device or resource busy. to ceph-disk prepare: partprobe failed on CentOS 7/RHEL 7 with parted < 3.2.16
#5 Updated by Loïc Dachary almost 8 years ago
- Status changed from In Progress to Fix Under Review
- Backport set to jewel
#6 Updated by Loïc Dachary almost 8 years ago
- Subject changed from ceph-disk prepare: partprobe failed on CentOS 7/RHEL 7 with parted < 3.2.16 to ceph-disk prepare: occasional partprobe failed on CentOS 7/RHEL 7 with parted < 3.2.16
#7 Updated by Dan van der Ster almost 8 years ago
Dupe of http://tracker.ceph.com/issues/15176 ??
#8 Updated by Alfredo Deza almost 8 years ago
Most of the problems we've seen in ceph-deploy issues regarding ceph-disk calls have been caused by the async nature of the udev rules.
Have we ensured that this odd behavior in partprobe is not being caused by racing udev rules? What happens when the commands ceph-disk is firing are done in a system that doesn't have Ceph installed (or has it but without any dev rules) ?
#9 Updated by Loïc Dachary almost 8 years ago
- Status changed from Fix Under Review to Duplicate
duplicate of #15176
#10 Updated by Loïc Dachary almost 8 years ago
- Duplicates Bug #15176: partprobe intermittent issues during ceph-disk prepare added
#11 Updated by Nathan Cutler almost 8 years ago
- Backport deleted (
jewel)