Project

General

Profile

Bug #15176

partprobe intermittent issues during ceph-disk prepare

Added by Vasu Kulkarni almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
03/17/2016
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

From the mail thread, looks like few folks are having issues with partprobe in ceph-disk and want to use partx instead, I am raising a tracker so that it gets some traction on what should be the right method

Loic's mail thread, looks like ceph-disk used partx before and had some cornercase issues
http://www.spinics.net/lists/ceph-devel/msg26301.html

From: Dan van der Ster <dan@vanderster.com>
Date: Thu, Mar 17, 2016 at 10:47 AM
Subject: Re: [ceph-users] ceph-disk from jewel has issues on redhat 7
To: Vasu Kulkarni <vakulkar@redhat.com>
Cc: Stephen Lord <Steve.Lord@quantum.com>, "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>

Hi,

It's true, partprobe works intermittently. I extracted the key
commands to show the problem:

[18:44]# /usr/sbin/sgdisk --new=2:0:20480M --change-name=2:'ceph
journal' --partition-guid=2:aa23e07d-e6b3-4261-a236-c0565971d88d --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdc The operation has completed successfully. [18:44]# partprobe /dev/sdc
Error: Error informing the kernel about modifications to partition /dev/sdc2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdc2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. Error: Failed to add partition 2 (Device or resource busy) [18:44]# partprobe /dev/sdc
[18:44]# partprobe /dev/sdc
Error: Error informing the kernel about modifications to partition /dev/sdc2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdc2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. Error: Failed to add partition 2 (Device or resource busy) [18:44]# partprobe /dev/sdc
Error: Error informing the kernel about modifications to partition /dev/sdc2 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/sdc2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. Error: Failed to add partition 2 (Device or resource busy) But partx works every time:

[18:46]# /usr/sbin/sgdisk --new=2:0:20480M --change-name=2:'ceph
journal' --partition-guid=2:aa23e07d-e6b3-4261-a236-c0565971d88d --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdd
The operation has completed successfully. [18:46]# partx -u /dev/sdd
[18:46]# partx -u /dev/sdd
[18:46]# partx -u /dev/sdd
[18:46]#

Related issues

Duplicated by Ceph - Bug #15918: ceph-disk prepare: occasional partprobe failed on CentOS 7/RHEL 7 with parted < 3.2.16 Duplicate 05/18/2016
Duplicated by Ceph - Bug #13984: ceph-disk prepare activates the osd on 7.1 Duplicate 12/04/2015
Copied to Ceph - Backport #16586: jewel: partprobe intermittent issues during ceph-disk prepare Resolved

History

#1 Updated by Dan van der Ster almost 3 years ago

Btw, no additional errors logged on the system.

#2 Updated by Dan van der Ster almost 3 years ago

I found rhbz#1245144, also rhbz#1283112, which are related. We are already running parted-3.1-23.el7.x86_64, which has the supposed fix, but clearly it's still racey, or we're not using partprobe correctly. There is a comment in the bz:

"Note that if you are calling parted multiple times from a script and not checking for device nodes to appear/disappear/whatever you will end up in the same situation. It is best to combine all the commands into a single parted call, or check for the expected changes between the calls."

Maybe we should check for the partition's existence before (unnecessarily) calling partprobe?

#3 Updated by Dan van der Ster almost 3 years ago

I installed parted-3.2-16.fc22.x86_64 on this machine and found it is now 100% reliable:

  1. rpm -q parted
    parted-3.2-16.fc22.x86_64
  1. /usr/sbin/sgdisk --new=2:0:20480M --change-name=2:'ceph journal' --partition-guid=2:aa23e07d-e6b3-4261-a236-c0565971d88d --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sde
    The operation has completed successfully.
  2. partprobe /dev/sde
  3. partprobe /dev/sde
  4. partprobe /dev/sde
  5. partprobe /dev/sde
  6. for i in `seq 100`; do partprobe /dev/sde; done #

#4 Updated by Ben Meekhof almost 3 years ago

I had the same issue with parted-3.1-23.el7.x86_64 (Scientific Linux 7.2). Runs of ceph-disk prepare would fail randomly with the same error from partprobe 'Device or resource busy', and the same error is repeated just by running 'partprobe' with any ceph OSD or Journal partitions in place. There was no consistency to the error. In repeated runs sometimes partprobe would return fine, sometimes not. Presumably the issue is triggered by the udev rules that Ceph (Infernalis) installs in /usr/lib/udev/rules.d/60-ceph-partuuid-workaround.rules

The issue was also resolved for me by installing parted-3.2-16.fc22.x86_64. I've seen no further indication of a problem in running through a batch of 'ceph-disk prepare'.

#5 Updated by Loic Dachary over 2 years ago

  • Duplicated by Bug #15918: ceph-disk prepare: occasional partprobe failed on CentOS 7/RHEL 7 with parted < 3.2.16 added

#6 Updated by Loic Dachary over 2 years ago

  • Status changed from New to In Progress
  • Priority changed from Normal to Urgent
  • Backport set to jewel

#7 Updated by Loic Dachary over 2 years ago

  • Reviewed source packages patch differences between 3.1-23 and 3.2-18 shows
    they both have
    • 0033-libparted-Use-read-only-when-probing-devices-on-linu.patch
    • 0020-tests-Use-wait_for_dev_to_-functions.patch
      but only 3.2-18 has
    • 0026-tests-Add-udevadm-settle-to-wait_for_-loop-1260664.patch
  • Mailing list thread "parted behavior change between 3.1 and 3.2"
[ubuntu@mira061 ~]$ sudo yum list installed | grep parted
parted.x86_64                      3.1-23.el7                          @anaconda
[ubuntu@mira061 ~]$ /usr/sbin/sgdisk --new=2:0:100M --change-name=2:'ceph journal' --mbrtogpt -- /dev/sdb

***************************************************************
Found invalid GPT and valid MBR; converting MBR to GPT format.
***************************************************************

Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
[ubuntu@mira061 ~]$ sudo partprobe /dev/sdb
[ubuntu@mira061 ~]$ ls -l /dev/sdb*
brw-rw----. 1 root disk 8, 16 May 23 08:21 /dev/sdb
brw-rw----. 1 root disk 8, 18 May 23 08:21 /dev/sdb2
[ubuntu@mira061 ~]$ /usr/sbin/sgdisk --new=3:101M:200M --change-name=3:'ceph journal' --mbrtogpt -- /dev/sdb
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
[ubuntu@mira061 ~]$ ls -l /dev/sdb*
brw-rw----. 1 root disk 8, 16 May 23 08:22 /dev/sdb
brw-rw----. 1 root disk 8, 18 May 23 08:22 /dev/sdb2
brw-rw----. 1 root disk 8, 19 May 23 08:22 /dev/sdb3
[ubuntu@mira061 ~]$ sudo mkfs /dev/sdb2
[ubuntu@mira061 ~]$ sudo mount /dev/sdb2 /mnt
[ubuntu@mira061 ~]$ /usr/sbin/sgdisk --new=4:201M:300M --change-name=4:'ceph journal' --mbrtogpt -- /dev/sdb
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
[ubuntu@mira061 ~]$ ls -l /dev/sdb*
brw-rw----. 1 root disk 8, 16 May 23 08:24 /dev/sdb
brw-rw----. 1 root disk 8, 18 May 23 08:24 /dev/sdb2
brw-rw----. 1 root disk 8, 19 May 23 08:24 /dev/sdb3
[ubuntu@mira061 ~]$ sudo partprobe /dev/sdb
[ubuntu@mira061 ~]$ ls -l /dev/sdb*
brw-rw----. 1 root disk 8, 16 May 23 08:24 /dev/sdb
brw-rw----. 1 root disk 8, 18 May 23 08:24 /dev/sdb2
brw-rw----. 1 root disk 8, 19 May 23 08:24 /dev/sdb3
brw-rw----. 1 root disk 8, 20 May 23 08:24 /dev/sdb4
[ubuntu@mira061 ~]$ uname -a
Linux mira061 4.6.0-rc3-ceph-15705-gac8ec84 #1 SMP Fri May 20 04:17:10 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux

Same as above with Linux mira061 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

#8 Updated by Loic Dachary over 2 years ago

After running (script provided by Daniel H. )

journal_dev="/dev/vdb" 
#osd_devs="/dev/vdc /dev/vdd /dev/vde" 
osd_devs="/dev/vdc" 

date '+== [%H:%M:%S] ======================================================='

for dev in ${osd_devs} ${journal_dev}; do
  echo "RUN: sgdisk --zap-all --clear --mbrtogpt -g -- ${dev}" 
  sgdisk --zap-all --clear --mbrtogpt -g -- ${dev} 2>&1
  echo "rcode=$?" 
  echo
done

udevadm settle
echo
sleep 2

echo "RUN: partprobe" 
partprobe 2>&1
echo "rcode=$?" 
udevadm settle

for osd_dev in ${osd_devs}; do 
  echo "RUN: ceph-disk prepare --cluster ceph ${osd_dev} ${journal_dev}" 
  ceph-disk --verbose prepare --cluster ceph ${osd_dev} ${journal_dev} 2>&1 || exit 1
  echo "rcode=$?" 
  echo
done
echo

On a

# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterpriseServer
Description:    Red Hat Enterprise Linux Server release 7.2 (Maipo)
Release:        7.2
Codename:       Maipo

/usr/sbin/partprobe /dev/vdc

always fails with

Error: Error informing the kernel about modifications to partition /dev/vdc1 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/vdc1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Failed to add partition 1 (Device or resource busy)
  • grep vdc /proc/mounts has nothing
  • fuser /dev/vdc /dev/vdc1 has nothing
  • lsof -n | grep vdc has nothing
  • ls -l /dev/mapper is empty

#9 Updated by Ilya Dryomov over 2 years ago

It doesn't stay busy, does it? It's clearly a race between partprobe
and udev. I'm guessing something like this:

- partprobe triggers a rescan of the partition table in the kernel
- a change uevent is generated by the kernel
- if partprobe gets to the device first (for whatever reason it seems
to open/close it multiple times during "partprobe <dev>"), udev backs
off and partprobe's BLKPG_ADD_PARTITION succeeds
- if udev gets to the device first, we've got a problem...

I haven't worked through the particulars, but I think the above
captures it. It could be that even if udev manages to grab the device
and issue BLKRRPART, but partprobe for whatever reason gets delayed
long enough for udev to finish, things work out in the end too.

Do you want to get the bottom of it? If it's fixed in partprobe 3.2,
what exactly are you after here - a workaround for 3.1?

#10 Updated by Loic Dachary over 2 years ago

Do you want to get the bottom of it? If it's fixed in partprobe 3.2,
what exactly are you after here - a workaround for 3.1?

Yes, a workaround is required because upgrading to 3.2 may not be possible on RHEL 7.2 (and other platforms maybe).

#11 Updated by Loic Dachary over 2 years ago

It doesn't stay busy, does it?

It stays busy forever.

#12 Updated by Ilya Dryomov over 2 years ago

Are you sure? If you repeat the partprobe again after a couple of seconds or try it a few times at random, at least one of the invocations has got to succeed.
Maybe try to invoke partprobe under strace to mess up with timing? It's got to be a race and a 100% hit rate is very unlikely...

#13 Updated by Loic Dachary over 2 years ago

Are you sure? If you repeat the partprobe again after a couple of seconds or try it a few times at random, at least one of the invocations has got to succeed.

You are correct, it does not always fail. I was (un)lucky enough in the past hours to always hit the case when it fails. But running a loop clearly shows that it fails intermittently.

# for i in $(seq 1 9) ; do echo partprobe ; /usr/sbin/partprobe /dev/vdc ; sleep 2 ; done
partprobe
Error: Error informing the kernel about modifications to partition /dev/vdc1 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/vdc1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Failed to add partition 1 (Device or resource busy)
partprobe
partprobe
partprobe
Error: Error informing the kernel about modifications to partition /dev/vdc1 -- Device or resource busy.  This means Linux won't know about any changes you made to /dev/vdc1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting.
Error: Failed to add partition 1 (Device or resource busy)
partprobe
partprobe
partprobe
partprobe
partprobe

#14 Updated by Loic Dachary over 2 years ago

  • Status changed from In Progress to Need Review

#15 Updated by Kefu Chai over 2 years ago

  • Status changed from Need Review to Pending Backport

#16 Updated by Nathan Cutler over 2 years ago

  • Copied to Backport #16586: jewel: partprobe intermittent issues during ceph-disk prepare added

#17 Updated by Loic Dachary over 2 years ago

  • Duplicated by Bug #13984: ceph-disk prepare activates the osd on 7.1 added

#18 Updated by Loic Dachary over 2 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF