Project

General

Profile

Bug #9665

ceph-disk zap should call partprobe

Added by Loic Dachary over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
ceph-disk
Target version:
-
Start date:
10/06/2014
Due date:
% Done:

90%

Source:
other
Tags:
Backport:
giant, firefly, dumpling
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

User description

Symptoms:

  • A disk is used by an OSD
  • The OSD is not longer useful and the disk is cleared
  • The disk is prepared for a new OSD
  • The new OSD is prepared but does not activate

Diagnostic:

  • The OSD is not activated because the /dev/disk/by-partuuid symbolic link is not updated by udev.

Workaround:

  • Reboot the machine

Fix:

  • When the disk is cleared via ceph-disk zap (which is called indirectly by ceph-deploy zap), it must notify the kernel via partprobe or partx.

Description

Not calling partprobe after zap may create situations that confuses udev:

  • ceph-disk prepare /dev/loop2
  • links are created in /dev/disk/by-partuuid
  • ceph-disk zap /dev/loop2
  • links are not removed from /dev/disk/by-partuuid
  • ceph-disk prepare /dev/loop2
  • some links are not created in /dev/disk/by-partuuid

In the following note that the /dev/loop2p2 link in /dev/disk/by-partuuid has not the current uuid. running udevadm monitor -e further shows that the /dev/loop2p2 is not removed as it should although /dev/loop2p1 is.

# ./ceph-disk $CEPH_DISK_ARGS prepare /dev/loop2
EPH_DISK_ARGS prepare /dev/loop2
INFO:ceph-disk:Running command: ceph-osd --cluster=ceph --show-config-value=fsid
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_type
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_type
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
INFO:ceph-disk:Running command: ceph-osd --cluster=ceph --show-config-value=osd_journal_size
INFO:ceph-disk:Will colocate journal with data on /dev/loop2
DEBUG:ceph-disk:Creating journal partition num 2 size 100 on /dev/loop2
INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:100M --change-name=2:ceph journal --partition-guid=2:f5189072-ce9d-4522-8336-b4b96c25c023 --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/loop2
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
DEBUG:ceph-disk:Calling partprobe on prepared device /dev/loop2
INFO:ceph-disk:Running command: /sbin/partprobe /dev/loop2
INFO:ceph-disk:Running command: /bin/udevadm settle
DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/f5189072-ce9d-4522-8336-b4b96c25c023
DEBUG:ceph-disk:Creating osd partition on /dev/loop2
INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:14ae949b-75f1-4842-b48a-746be713b7e0 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/loop2
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
INFO:ceph-disk:Running command: /sbin/partprobe /dev/loop2
INFO:ceph-disk:Running command: /bin/udevadm settle
DEBUG:ceph-disk:Creating xfs fs on /dev/loop2p1
INFO:ceph-disk:Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/loop2p1
meta-data=/dev/loop2p1           isize=2048   agcount=4, agsize=6335 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=25339, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=1232, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
DEBUG:ceph-disk:Mounting /dev/loop2p1 on test-ceph-disk/tmp/mnt.83v37u with options noatime,inode64
INFO:ceph-disk:Running command: mount -t xfs -o noatime,inode64 -- /dev/loop2p1 test-ceph-disk/tmp/mnt.83v37u
DEBUG:ceph-disk:Preparing osd data dir test-ceph-disk/tmp/mnt.83v37u
DEBUG:ceph-disk:Creating symlink test-ceph-disk/tmp/mnt.83v37u/journal -> /dev/disk/by-partuuid/f5189072-ce9d-4522-8336-b4b96c25c023
DEBUG:ceph-disk:Unmounting test-ceph-disk/tmp/mnt.83v37u
INFO:ceph-disk:Running command: /bin/umount -- test-ceph-disk/tmp/mnt.83v37u
INFO:ceph-disk:Running command: /sbin/sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/loop2
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
DEBUG:ceph-disk:Calling partprobe on prepared device /dev/loop2
INFO:ceph-disk:Running command: /sbin/partprobe /dev/loop2
# ls -l /dev/disk/by-partuuid
k/by-partuuid
total 0
lrwxrwxrwx 1 root root 13 Oct  6 14:40 14ae949b-75f1-4842-b48a-746be713b7e0 -> ../../loop2p1
lrwxrwxrwx 1 root root 13 Oct  6 14:40 f5189072-ce9d-4522-8336-b4b96c25c023 -> ../../loop2p2
# ceph-disk zap /dev/loop2
/dev/loop2
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
on the recovery & transformation menu to examine the two tables.

Warning! One or more CRCs don't match. You should repair the disk!

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
# ls -l /dev/disk/by-partuuid
k/by-partuuid
total 0
lrwxrwxrwx 1 root root 13 Oct  6 14:40 14ae949b-75f1-4842-b48a-746be713b7e0 -> ../../loop2p1
lrwxrwxrwx 1 root root 13 Oct  6 14:40 f5189072-ce9d-4522-8336-b4b96c25c023 -> ../../loop2p2
# ./ceph-disk $CEPH_DISK_ARGS prepare /dev/loop2
EPH_DISK_ARGS prepare /dev/loop2
INFO:ceph-disk:Running command: ceph-osd --cluster=ceph --show-config-value=fsid
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_type
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_type
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
INFO:ceph-disk:Running command: ceph-osd --cluster=ceph --show-config-value=osd_journal_size
INFO:ceph-disk:Will colocate journal with data on /dev/loop2
DEBUG:ceph-disk:Creating journal partition num 2 size 100 on /dev/loop2
INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:100M --change-name=2:ceph journal --partition-guid=2:046fe72b-5ee2-494d-950c-54a4995f6f4e --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/loop2
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
DEBUG:ceph-disk:Calling partprobe on prepared device /dev/loop2
INFO:ceph-disk:Running command: /sbin/partprobe /dev/loop2
INFO:ceph-disk:Running command: /bin/udevadm settle
DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/046fe72b-5ee2-494d-950c-54a4995f6f4e
DEBUG:ceph-disk:Creating osd partition on /dev/loop2
INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:3fdcaea2-c113-480e-8bdf-90534c5e2a67 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/loop2
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
INFO:ceph-disk:Running command: /sbin/partprobe /dev/loop2
INFO:ceph-disk:Running command: /bin/udevadm settle
DEBUG:ceph-disk:Creating xfs fs on /dev/loop2p1
INFO:ceph-disk:Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/loop2p1
meta-data=/dev/loop2p1           isize=2048   agcount=4, agsize=6335 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=25339, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=1232, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
DEBUG:ceph-disk:Mounting /dev/loop2p1 on test-ceph-disk/tmp/mnt.gxIGef with options noatime,inode64
INFO:ceph-disk:Running command: mount -t xfs -o noatime,inode64 -- /dev/loop2p1 test-ceph-disk/tmp/mnt.gxIGef
DEBUG:ceph-disk:Preparing osd data dir test-ceph-disk/tmp/mnt.gxIGef
DEBUG:ceph-disk:Creating symlink test-ceph-disk/tmp/mnt.gxIGef/journal -> /dev/disk/by-partuuid/046fe72b-5ee2-494d-950c-54a4995f6f4e
DEBUG:ceph-disk:Unmounting test-ceph-disk/tmp/mnt.gxIGef
INFO:ceph-disk:Running command: /bin/umount -- test-ceph-disk/tmp/mnt.gxIGef
INFO:ceph-disk:Running command: /sbin/sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/loop2
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
DEBUG:ceph-disk:Calling partprobe on prepared device /dev/loop2
INFO:ceph-disk:Running command: /sbin/partprobe /dev/loop2
# ls -l /dev/disk/by-partuuid
k/by-partuuid
total 0
lrwxrwxrwx 1 root root 13 Oct  6 14:41 3fdcaea2-c113-480e-8bdf-90534c5e2a67 -> ../../loop2p1
lrwxrwxrwx 1 root root 13 Oct  6 14:40 f5189072-ce9d-4522-8336-b4b96c25c023 -> ../../loop2p2
# 


Related issues

Related to devops - Bug #9721: partx -a should be called after creating the data partition Rejected 10/10/2014

Associated revisions

Revision fed3b06c (diff)
Added by Loic Dachary over 4 years ago

ceph-disk: run partprobe after zap

Not running partprobe after zapping a device can lead to the following:

  • ceph-disk prepare /dev/loop2
  • links are created in /dev/disk/by-partuuid
  • ceph-disk zap /dev/loop2
  • links are not removed from /dev/disk/by-partuuid
  • ceph-disk prepare /dev/loop2
  • some links are not created in /dev/disk/by-partuuid

This is assuming there is a bug in the way udev events are handled by
the operating system.

http://tracker.ceph.com/issues/9665 Fixes: #9665

Signed-off-by: Loic Dachary <>

Revision cb1d6811 (diff)
Added by Loic Dachary over 4 years ago

ceph-disk: run partprobe after zap

Not running partprobe after zapping a device can lead to the following:

  • ceph-disk prepare /dev/loop2
  • links are created in /dev/disk/by-partuuid
  • ceph-disk zap /dev/loop2
  • links are not removed from /dev/disk/by-partuuid
  • ceph-disk prepare /dev/loop2
  • some links are not created in /dev/disk/by-partuuid

This is assuming there is a bug in the way udev events are handled by
the operating system.

http://tracker.ceph.com/issues/9665 Fixes: #9665

Signed-off-by: Loic Dachary <>
(cherry picked from commit fed3b06c47a5ef22cb3514c7647544120086d1e7)

Revision e70a8146 (diff)
Added by Loic Dachary over 4 years ago

ceph-disk: run partprobe after zap

Not running partprobe after zapping a device can lead to the following:

  • ceph-disk prepare /dev/loop2
  • links are created in /dev/disk/by-partuuid
  • ceph-disk zap /dev/loop2
  • links are not removed from /dev/disk/by-partuuid
  • ceph-disk prepare /dev/loop2
  • some links are not created in /dev/disk/by-partuuid

This is assuming there is a bug in the way udev events are handled by
the operating system.

http://tracker.ceph.com/issues/9665 Fixes: #9665

Signed-off-by: Loic Dachary <>
(cherry picked from commit fed3b06c47a5ef22cb3514c7647544120086d1e7)

Revision 6de6f575 (diff)
Added by Loic Dachary over 4 years ago

ceph-disk: run partprobe after zap

Not running partprobe after zapping a device can lead to the following:

  • ceph-disk prepare /dev/loop2
  • links are created in /dev/disk/by-partuuid
  • ceph-disk zap /dev/loop2
  • links are not removed from /dev/disk/by-partuuid
  • ceph-disk prepare /dev/loop2
  • some links are not created in /dev/disk/by-partuuid

This is assuming there is a bug in the way udev events are handled by
the operating system.

http://tracker.ceph.com/issues/9665 Fixes: #9665

Signed-off-by: Loic Dachary <>
(cherry picked from commit fed3b06c47a5ef22cb3514c7647544120086d1e7)

History

#1 Updated by Loic Dachary over 4 years ago

  • Status changed from Verified to Testing
  • % Done changed from 0 to 60

#2 Updated by Loic Dachary over 4 years ago

  • Backport set to giant, firefly, emperor, dumpling

#4 Updated by Loic Dachary over 4 years ago

<dvanders> loicd: I saw that change. you factorized the "update partitions" part, but the issue i observe is that partx/partprobe is not triggering udev correctly on a loaded server
<loicd> oh really ? 
<loicd> dam
<dvanders> i never observed this on our (idle) test cluster
<loicd> dvanders: did you trace this back to a known bug ? 
<dvanders> but now that i tried to reuse a journal partition on our busy prod cluster, the new journal symlink isn't appearing
<dvanders> i didn't find a known bug about this

In the context of https://github.com/ceph/ceph/pull/2955

#5 Updated by Loic Dachary over 4 years ago

  • Status changed from Testing to Resolved
  • % Done changed from 60 to 100

#6 Updated by Loic Dachary over 4 years ago

  • Status changed from Resolved to Pending Backport
  • % Done changed from 100 to 90

let's wait a week or two before backporting

#8 Updated by Loic Dachary over 4 years ago

  • Status changed from Pending Backport to Need Review

#9 Updated by Loic Dachary over 4 years ago

  • Backport changed from giant, firefly, emperor, dumpling to giant, firefly, dumpling

#10 Updated by Loic Dachary over 4 years ago

  • Status changed from Need Review to Resolved

#11 Updated by Loic Dachary over 4 years ago

  • Description updated (diff)

#12 Updated by Loic Dachary over 4 years ago

  • Description updated (diff)

#13 Updated by Loic Dachary over 4 years ago

  • Description updated (diff)

Also available in: Atom PDF