Project

General

Profile

Bug #19428

ceph-disk: Racing between partition creation & device node creation

Added by Erwan Velu 8 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
03/30/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
jewel, kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-disk
Release:
kraken
Needs Doc:
No

Description

On the ceph-docker CI, I do have a trace that shows up a racing between the sgdisk actions to create partitions and the device node availability in /sys/block.

I've been instrumenting the code in ceph-disk, to be more explicit.
My consist in printing the content of /sys/block/<device> to see what's inside.
At the first iteration, the sdb2 is missing while at the second one (1 sec after) the sdb2 showed up.

I investigated & try to reproduce it locally. I have to admit I failed even by running that code inside a nasted VM.
I suspected 'udevadm settle' returning too soon but didn't got able to proove that.

So that could mean update_partition() isn't blocking enough.

The actual master code of ceph-disk is affected.

My current thought are just about adding a testing loop inside get_partition_dev() which isn't that satisfying.

+ ceph-disk -v prepare --cluster test --journal-uuid ab5a4232-ab11-5ae7-ac16-6e8d08eab818 /dev/sdb
command: Running command: /usr/bin/ceph-osd --cluster=test --show-config-value=fsid
command: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --cluster test --setuser ceph --setgroup ceph
command: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --cluster test --setuser ceph --setgroup ceph
command: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --cluster test --setuser ceph --setgroup ceph
get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
set_type: Will colocate journal with data on /dev/sdb
command: Running command: /usr/bin/ceph-osd --cluster=test --show-config-value=osd_journal_size
get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
command: Running command: /usr/bin/ceph-conf --cluster=test --name=osd. --lookup osd_mkfs_type
command: Running command: /usr/bin/ceph-conf --cluster=test --name=osd. --lookup osd_mkfs_options_xfs
command: Running command: /usr/bin/ceph-conf --cluster=test --name=osd. --lookup osd_mount_options_xfs
get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
ptype_tobe_for_name: name = journal
get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
create_partition: Creating journal partition num 2 size 100 on /dev/sdb
command_check_call: Running command: /usr/sbin/sgdisk --new=2:0:+100M --change-name=2:ceph journal --partition-guid=2:ab5a4232-ab11-5ae7-ac16-6e8d08eab818 --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdb
update_partition: Calling partprobe on created device /dev/sdb
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
command: Running command: /usr/bin/flock -s /dev/sdb /usr/sbin/partprobe /dev/sdb
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
get_dm_uuid: get_dm_uuid /dev/sdb uuid path is /sys/dev/block/8:16/dm/uuid
get_partition_dev: Listing /sys/block/sdb
get_partition_dev: -> ro
get_partition_dev: -> bdi
get_partition_dev: -> dev
get_partition_dev: -> size
get_partition_dev: -> stat
get_partition_dev: -> power
get_partition_dev: -> range
get_partition_dev: -> queue
get_partition_dev: -> trace
get_partition_dev: -> discard_alignment
get_partition_dev: -> device
get_partition_dev: -> events
get_partition_dev: -> subsystem
get_partition_dev: -> ext_range
get_partition_dev: -> slaves
get_partition_dev: -> uevent
get_partition_dev: -> events_poll_msecs
get_partition_dev: -> alignment_offset
get_partition_dev: -> holders
get_partition_dev: -> badblocks
get_partition_dev: -> inflight
get_partition_dev: -> removable
get_partition_dev: -> capability
get_partition_dev: -> events_async
get_partition_dev: ERREUR PARTITION !
get_partition_dev: Listing no 1 /sys/block/sdb
get_partition_dev: -> ro
get_partition_dev: -> bdi
get_partition_dev: -> dev
get_partition_dev: -> sdb2
get_partition_dev: -> size
get_partition_dev: -> stat
get_partition_dev: -> power
get_partition_dev: -> range
get_partition_dev: -> queue
get_partition_dev: -> trace
get_partition_dev: -> discard_alignment
get_partition_dev: -> device
get_partition_dev: -> events
get_partition_dev: -> subsystem
get_partition_dev: -> ext_range
get_partition_dev: -> slaves
get_partition_dev: -> uevent
get_partition_dev: -> events_poll_msecs
get_partition_dev: -> alignment_offset
get_partition_dev: -> holders
get_partition_dev: -> badblocks
get_partition_dev: -> inflight
get_partition_dev: -> removable
get_partition_dev: -> capability
get_partition_dev: -> events_async

Related issues

Copied to Ceph - Backport #19493: jewel: ceph-disk: Racing between partition creation & device node creation Resolved
Copied to Ceph - Backport #20034: kraken: ceph-disk: Racing between partition creation & device node creation Resolved

Associated revisions

Revision 30a6823c (diff)
Added by Erwan Velu 8 months ago

ceph-disk: Adding retry loop in get_partition_dev()

There is very rare cases where get_partition_dev() is called before the actual partition is available in /sys/block/<device>.

It appear that waiting a very short is usually enough to get the partition beein populated.

Analysis:
update_partition() is supposed to be enough to avoid any racing between events sent by parted/sgdisk/partprobe and
the actual creation on the /sys/block/<device>/* entrypoint.
On our CI that race occurs pretty often but trying to reproduce it locally never been possible.

This patch is almost a workaround rather than a fix to the real problem.
It offer retrying after a very short to be make a chance the device to appear.
This approach have been succesful on the CI.

Note his patch is not changing the timing when the device is perfectly created on time and just differ by a 1/5th up to 2 seconds when the bug occurs.

A typical output from the build running on a CI with that code.
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_partition_dev: Try 1/10 : partition 2 for /dev/sda does not in /sys/block/sda
get_partition_dev: Found partition 2 for /dev/sda after 1 tries
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sda2 uuid path is /sys/dev/block/8:2/dm/uuid

fixes: #19428

Signed-off-by: Erwan Velu <>

Revision 93e7b95e (diff)
Added by Erwan Velu 8 months ago

ceph-disk: Adding retry loop in get_partition_dev()

There is very rare cases where get_partition_dev() is called before the actual partition is available in /sys/block/<device>.

It appear that waiting a very short is usually enough to get the partition beein populated.

Analysis:
update_partition() is supposed to be enough to avoid any racing between events sent by parted/sgdisk/partprobe and
the actual creation on the /sys/block/<device>/* entrypoint.
On our CI that race occurs pretty often but trying to reproduce it locally never been possible.

This patch is almost a workaround rather than a fix to the real problem.
It offer retrying after a very short to be make a chance the device to appear.
This approach have been succesful on the CI.

Note his patch is not changing the timing when the device is perfectly created on time and just differ by a 1/5th up to 2 seconds when the bug occurs.

A typical output from the build running on a CI with that code.
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_partition_dev: Try 1/10 : partition 2 for /dev/sda does not in /sys/block/sda
get_partition_dev: Found partition 2 for /dev/sda after 1 tries
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sda2 uuid path is /sys/dev/block/8:2/dm/uuid

fixes: #19428

Signed-off-by: Erwan Velu <>

Revision a20d2b89 (diff)
Added by Erwan Velu 8 months ago

ceph-disk: Adding retry loop in get_partition_dev()

There is very rare cases where get_partition_dev() is called before the actual partition is available in /sys/block/<device>.

It appear that waiting a very short is usually enough to get the partition beein populated.

Analysis:
update_partition() is supposed to be enough to avoid any racing between events sent by parted/sgdisk/partprobe and
the actual creation on the /sys/block/<device>/* entrypoint.
On our CI that race occurs pretty often but trying to reproduce it locally never been possible.

This patch is almost a workaround rather than a fix to the real problem.
It offer retrying after a very short to be make a chance the device to appear.
This approach have been succesful on the CI.

Note his patch is not changing the timing when the device is perfectly created on time and just differ by a 1/5th up to 2 seconds when the bug occurs.

A typical output from the build running on a CI with that code.
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_partition_dev: Try 1/10 : partition 2 for /dev/sda does not in /sys/block/sda
get_partition_dev: Found partition 2 for /dev/sda after 1 tries
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sda2 uuid path is /sys/dev/block/8:2/dm/uuid

fixes: #19428

Signed-off-by: Erwan Velu <>
(cherry picked from commit 93e7b95ed8b4c78daebf7866bb1f0826d7199075)

Revision ea186807 (diff)
Added by Erwan Velu 5 months ago

ceph-disk: Adding retry loop in get_partition_dev()

There is very rare cases where get_partition_dev() is called before the actual partition is available in /sys/block/<device>.

It appear that waiting a very short is usually enough to get the partition beein populated.

Analysis:
update_partition() is supposed to be enough to avoid any racing between events sent by parted/sgdisk/partprobe and
the actual creation on the /sys/block/<device>/* entrypoint.
On our CI that race occurs pretty often but trying to reproduce it locally never been possible.

This patch is almost a workaround rather than a fix to the real problem.
It offer retrying after a very short to be make a chance the device to appear.
This approach have been succesful on the CI.

Note his patch is not changing the timing when the device is perfectly created on time and just differ by a 1/5th up to 2 seconds when the bug occurs.

A typical output from the build running on a CI with that code.
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_partition_dev: Try 1/10 : partition 2 for /dev/sda does not in /sys/block/sda
get_partition_dev: Found partition 2 for /dev/sda after 1 tries
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sda2 uuid path is /sys/dev/block/8:2/dm/uuid

fixes: #19428

Signed-off-by: Erwan Velu <>
(cherry picked from commit 93e7b95ed8b4c78daebf7866bb1f0826d7199075)

History

#2 Updated by Loic Dachary 8 months ago

  • Description updated (diff)

#4 Updated by S├ębastien Han 8 months ago

  • Status changed from New to Pending Backport
  • Release jewel added
  • Release deleted (kraken)

#5 Updated by S├ębastien Han 8 months ago

  • Backport set to jewel
  • Release kraken added
  • Release deleted (jewel)

#6 Updated by Nathan Cutler 8 months ago

  • Copied to Backport #19493: jewel: ceph-disk: Racing between partition creation & device node creation added

#7 Updated by Nathan Cutler 7 months ago

  • Status changed from Pending Backport to Resolved

#8 Updated by Nathan Cutler 6 months ago

  • Status changed from Resolved to Pending Backport
  • Backport changed from jewel to jewel, kraken

#9 Updated by Nathan Cutler 6 months ago

  • Copied to Backport #20034: kraken: ceph-disk: Racing between partition creation & device node creation added

#10 Updated by Nathan Cutler 4 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF