Project

General

Profile

Actions

Bug #9665

closed

ceph-disk zap should call partprobe

Added by Loïc Dachary over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
ceph-disk
Target version:
-
% Done:

90%

Source:
other
Tags:
Backport:
giant, firefly, dumpling
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

User description

Symptoms:

  • A disk is used by an OSD
  • The OSD is not longer useful and the disk is cleared
  • The disk is prepared for a new OSD
  • The new OSD is prepared but does not activate

Diagnostic:

  • The OSD is not activated because the /dev/disk/by-partuuid symbolic link is not updated by udev.

Workaround:

  • Reboot the machine

Fix:

  • When the disk is cleared via ceph-disk zap (which is called indirectly by ceph-deploy zap), it must notify the kernel via partprobe or partx.

Description

Not calling partprobe after zap may create situations that confuses udev:

  • ceph-disk prepare /dev/loop2
  • links are created in /dev/disk/by-partuuid
  • ceph-disk zap /dev/loop2
  • links are not removed from /dev/disk/by-partuuid
  • ceph-disk prepare /dev/loop2
  • some links are not created in /dev/disk/by-partuuid

In the following note that the /dev/loop2p2 link in /dev/disk/by-partuuid has not the current uuid. running udevadm monitor -e further shows that the /dev/loop2p2 is not removed as it should although /dev/loop2p1 is.

# ./ceph-disk $CEPH_DISK_ARGS prepare /dev/loop2
EPH_DISK_ARGS prepare /dev/loop2
INFO:ceph-disk:Running command: ceph-osd --cluster=ceph --show-config-value=fsid
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_type
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_type
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
INFO:ceph-disk:Running command: ceph-osd --cluster=ceph --show-config-value=osd_journal_size
INFO:ceph-disk:Will colocate journal with data on /dev/loop2
DEBUG:ceph-disk:Creating journal partition num 2 size 100 on /dev/loop2
INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:100M --change-name=2:ceph journal --partition-guid=2:f5189072-ce9d-4522-8336-b4b96c25c023 --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/loop2
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
DEBUG:ceph-disk:Calling partprobe on prepared device /dev/loop2
INFO:ceph-disk:Running command: /sbin/partprobe /dev/loop2
INFO:ceph-disk:Running command: /bin/udevadm settle
DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/f5189072-ce9d-4522-8336-b4b96c25c023
DEBUG:ceph-disk:Creating osd partition on /dev/loop2
INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:14ae949b-75f1-4842-b48a-746be713b7e0 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/loop2
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
INFO:ceph-disk:Running command: /sbin/partprobe /dev/loop2
INFO:ceph-disk:Running command: /bin/udevadm settle
DEBUG:ceph-disk:Creating xfs fs on /dev/loop2p1
INFO:ceph-disk:Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/loop2p1
meta-data=/dev/loop2p1           isize=2048   agcount=4, agsize=6335 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=25339, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=1232, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
DEBUG:ceph-disk:Mounting /dev/loop2p1 on test-ceph-disk/tmp/mnt.83v37u with options noatime,inode64
INFO:ceph-disk:Running command: mount -t xfs -o noatime,inode64 -- /dev/loop2p1 test-ceph-disk/tmp/mnt.83v37u
DEBUG:ceph-disk:Preparing osd data dir test-ceph-disk/tmp/mnt.83v37u
DEBUG:ceph-disk:Creating symlink test-ceph-disk/tmp/mnt.83v37u/journal -> /dev/disk/by-partuuid/f5189072-ce9d-4522-8336-b4b96c25c023
DEBUG:ceph-disk:Unmounting test-ceph-disk/tmp/mnt.83v37u
INFO:ceph-disk:Running command: /bin/umount -- test-ceph-disk/tmp/mnt.83v37u
INFO:ceph-disk:Running command: /sbin/sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/loop2
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
DEBUG:ceph-disk:Calling partprobe on prepared device /dev/loop2
INFO:ceph-disk:Running command: /sbin/partprobe /dev/loop2
# ls -l /dev/disk/by-partuuid
k/by-partuuid
total 0
lrwxrwxrwx 1 root root 13 Oct  6 14:40 14ae949b-75f1-4842-b48a-746be713b7e0 -> ../../loop2p1
lrwxrwxrwx 1 root root 13 Oct  6 14:40 f5189072-ce9d-4522-8336-b4b96c25c023 -> ../../loop2p2
# ceph-disk zap /dev/loop2
/dev/loop2
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
on the recovery & transformation menu to examine the two tables.

Warning! One or more CRCs don't match. You should repair the disk!

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
# ls -l /dev/disk/by-partuuid
k/by-partuuid
total 0
lrwxrwxrwx 1 root root 13 Oct  6 14:40 14ae949b-75f1-4842-b48a-746be713b7e0 -> ../../loop2p1
lrwxrwxrwx 1 root root 13 Oct  6 14:40 f5189072-ce9d-4522-8336-b4b96c25c023 -> ../../loop2p2
# ./ceph-disk $CEPH_DISK_ARGS prepare /dev/loop2
EPH_DISK_ARGS prepare /dev/loop2
INFO:ceph-disk:Running command: ceph-osd --cluster=ceph --show-config-value=fsid
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_type
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_type
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
INFO:ceph-disk:Running command: ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
INFO:ceph-disk:Running command: ceph-osd --cluster=ceph --show-config-value=osd_journal_size
INFO:ceph-disk:Will colocate journal with data on /dev/loop2
DEBUG:ceph-disk:Creating journal partition num 2 size 100 on /dev/loop2
INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:100M --change-name=2:ceph journal --partition-guid=2:046fe72b-5ee2-494d-950c-54a4995f6f4e --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/loop2
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
DEBUG:ceph-disk:Calling partprobe on prepared device /dev/loop2
INFO:ceph-disk:Running command: /sbin/partprobe /dev/loop2
INFO:ceph-disk:Running command: /bin/udevadm settle
DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/046fe72b-5ee2-494d-950c-54a4995f6f4e
DEBUG:ceph-disk:Creating osd partition on /dev/loop2
INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:3fdcaea2-c113-480e-8bdf-90534c5e2a67 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/loop2
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
INFO:ceph-disk:Running command: /sbin/partprobe /dev/loop2
INFO:ceph-disk:Running command: /bin/udevadm settle
DEBUG:ceph-disk:Creating xfs fs on /dev/loop2p1
INFO:ceph-disk:Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/loop2p1
meta-data=/dev/loop2p1           isize=2048   agcount=4, agsize=6335 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=25339, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=1232, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
DEBUG:ceph-disk:Mounting /dev/loop2p1 on test-ceph-disk/tmp/mnt.gxIGef with options noatime,inode64
INFO:ceph-disk:Running command: mount -t xfs -o noatime,inode64 -- /dev/loop2p1 test-ceph-disk/tmp/mnt.gxIGef
DEBUG:ceph-disk:Preparing osd data dir test-ceph-disk/tmp/mnt.gxIGef
DEBUG:ceph-disk:Creating symlink test-ceph-disk/tmp/mnt.gxIGef/journal -> /dev/disk/by-partuuid/046fe72b-5ee2-494d-950c-54a4995f6f4e
DEBUG:ceph-disk:Unmounting test-ceph-disk/tmp/mnt.gxIGef
INFO:ceph-disk:Running command: /bin/umount -- test-ceph-disk/tmp/mnt.gxIGef
INFO:ceph-disk:Running command: /sbin/sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/loop2
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
DEBUG:ceph-disk:Calling partprobe on prepared device /dev/loop2
INFO:ceph-disk:Running command: /sbin/partprobe /dev/loop2
# ls -l /dev/disk/by-partuuid
k/by-partuuid
total 0
lrwxrwxrwx 1 root root 13 Oct  6 14:41 3fdcaea2-c113-480e-8bdf-90534c5e2a67 -> ../../loop2p1
lrwxrwxrwx 1 root root 13 Oct  6 14:40 f5189072-ce9d-4522-8336-b4b96c25c023 -> ../../loop2p2
# 


Related issues 1 (0 open1 closed)

Related to devops - Bug #9721: partx -a should be called after creating the data partitionRejectedLoïc Dachary10/10/2014

Actions
Actions #1

Updated by Loïc Dachary over 9 years ago

  • Status changed from 12 to 7
  • % Done changed from 0 to 60
Actions #2

Updated by Loïc Dachary over 9 years ago

  • Backport set to giant, firefly, emperor, dumpling
Actions #4

Updated by Loïc Dachary over 9 years ago

<dvanders> loicd: I saw that change. you factorized the "update partitions" part, but the issue i observe is that partx/partprobe is not triggering udev correctly on a loaded server
<loicd> oh really ? 
<loicd> dam
<dvanders> i never observed this on our (idle) test cluster
<loicd> dvanders: did you trace this back to a known bug ? 
<dvanders> but now that i tried to reuse a journal partition on our busy prod cluster, the new journal symlink isn't appearing
<dvanders> i didn't find a known bug about this

In the context of https://github.com/ceph/ceph/pull/2955
Actions #5

Updated by Loïc Dachary over 9 years ago

  • Status changed from 7 to Resolved
  • % Done changed from 60 to 100
Actions #6

Updated by Loïc Dachary over 9 years ago

  • Status changed from Resolved to Pending Backport
  • % Done changed from 100 to 90

let's wait a week or two before backporting

Actions #8

Updated by Loïc Dachary over 9 years ago

  • Status changed from Pending Backport to Fix Under Review
Actions #9

Updated by Loïc Dachary over 9 years ago

  • Backport changed from giant, firefly, emperor, dumpling to giant, firefly, dumpling
Actions #10

Updated by Loïc Dachary over 9 years ago

  • Status changed from Fix Under Review to Resolved
Actions #11

Updated by Loïc Dachary about 9 years ago

  • Description updated (diff)
Actions #12

Updated by Loïc Dachary about 9 years ago

  • Description updated (diff)
Actions #13

Updated by Loïc Dachary about 9 years ago

  • Description updated (diff)
Actions

Also available in: Atom PDF