Project

General

Profile

Bug #15918

ceph-disk prepare: occasional partprobe failed on CentOS 7/RHEL 7 with parted < 3.2.16

Added by Loic Dachary about 4 years ago. Updated about 4 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

How reproducible:
40% on our VMs

Steps to Reproduce:
1. Create and install node for Ceph OSD with at least two spare disks.

2. Run command for disk preparation for a Ceph OSD.
Device /dev/vdb is targeted for journal, /dev/vdc for OSD data. If you have more spare disks, you might try to repeat this command for each "OSD data" device.

  1. ceph-disk prepare --cluster ceph /dev/vdc /dev/vdb

3. Before trying again, clean up both the journal and OSD data devices: # sgdisk --zap-all --clear --mbrtogpt g - /dev/vdb # sgdisk --zap-all --clear --mbrtogpt g - /dev/vdc

Actual results:
Sometimes the ceph-disk command fails with following (or similar) error: # ceph-disk prepare --cluster ceph /dev/vdc /dev/vdb
prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data
The operation has completed successfully.
ceph-disk: Error: partprobe /dev/vdb failed : Error: Error informing the kernel about modifications to partition /dev/vdb1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/vdb1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Failed to add partition 1 (Device or resource busy) # echo $?
1

Expected results:
Command ceph-disk should properly prepare the disk for Ceph OSD.


Related issues

Duplicates Ceph - Bug #15176: partprobe intermittent issues during ceph-disk prepare Resolved 03/17/2016

History

#1 Updated by David Orman about 4 years ago

If you're using CentOS 7/RHEL 7, try upgrading to parted-3.2-16.fc22. This fixed the issue (I encountered the same problem) for me (I believe I rebooted since I had already tried multiple times with the system with the older version).

#2 Updated by Daniel Horak about 4 years ago

Hi David,
big thanks for your suggestion.

Yes it is on RHEL 7 and I can confirm, that parted-3.2-16.fc22 fix the issue.
Originally with parted-3.1-23.el7.x86_64 it was failing.

#3 Updated by Loic Dachary about 4 years ago

Thanks David, that's immensely helpful :-) In the parted release notes there is

Avoid generating udev add/remove events for all unmodified partitions
when writing a new table.

which is this commit

But I can't figure out how that would be related to the behavior you had. I don't see anything else in the release notes that would be relevant. It could be that partprobe is not at fault but parted is (ceph-disk uses it as well to scan the partition table). But I don't see a change to parted between 3.1 and 3.2 that could explain the problem.

I'll keep looking a little more to figure out what's going on exactly. Any suggestion / ideas would be most welcome :-)

#4 Updated by Loic Dachary about 4 years ago

  • Subject changed from ceph-disk prepare: Error: partprobe /dev/vdb failed : Error: Error informing the kernel about modifications to partition /dev/vdb1 -- Device or resource busy. to ceph-disk prepare: partprobe failed on CentOS 7/RHEL 7 with parted < 3.2.16

#5 Updated by Loic Dachary about 4 years ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to jewel

#6 Updated by Loic Dachary about 4 years ago

  • Subject changed from ceph-disk prepare: partprobe failed on CentOS 7/RHEL 7 with parted < 3.2.16 to ceph-disk prepare: occasional partprobe failed on CentOS 7/RHEL 7 with parted < 3.2.16

#8 Updated by Alfredo Deza about 4 years ago

Most of the problems we've seen in ceph-deploy issues regarding ceph-disk calls have been caused by the async nature of the udev rules.

Have we ensured that this odd behavior in partprobe is not being caused by racing udev rules? What happens when the commands ceph-disk is firing are done in a system that doesn't have Ceph installed (or has it but without any dev rules) ?

#9 Updated by Loic Dachary about 4 years ago

  • Status changed from Fix Under Review to Duplicate

duplicate of #15176

#10 Updated by Loic Dachary about 4 years ago

  • Duplicates Bug #15176: partprobe intermittent issues during ceph-disk prepare added

#11 Updated by Nathan Cutler about 4 years ago

  • Backport deleted (jewel)

Also available in: Atom PDF