Project

General

Profile

Actions

Bug #21493

closed

ceph-disk prepare cannot find paritition that does exist

Added by John Fulton over 6 years ago. Updated over 6 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-ansible, ceph-disk
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When using ceph-disk prepare as provided by ceph-osd-10.2.9-0.el7.x86_64 on /dev/vdb, which exists on my system, I get a stack trace [1] and see [Errno 2] No such file or directory: '/dev/vdb1'. Unfortunately it seems intermittent. For example, if I zap the device and re-run prepare, then I don't see this error but it does recur and will fail a ceph-ansible deployment. I am using ceph-docker [2].

[1]

Traceback (most recent call last):
File "/usr/sbin/ceph-disk", line 9, in <module>
load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5095, in run
main(sys.argv[1:])
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5046, in main
args.func(args)
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1855, in main
Prepare.factory(args).prepare()
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1844, in prepare
self.prepare_locked()
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1875, in prepare_locked
self.data.prepare(self.journal)
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2542, in prepare
self.prepare_device(*to_prepare_list)
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2705, in prepare_device
self.set_data_partition()
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2632, in set_data_partition
self.partition = self.create_data_partition()
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2616, in create_data_partition
return device.get_partition(partition_number)
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1622, in get_partition
path=self.path, dev=dev, args=self.args)
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1685, in factory
(dev is not None and is_mpath(dev))):
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 535, in is_mpath
uuid = get_dm_uuid(dev)
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 522, in get_dm_uuid
uuid_path = os.path.join(block_path(dev), 'dm', 'uuid')
File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 516, in block_path
rdev = os.stat(path).st_rdev

[2] Workaround: zap and re-run:

docker run -ti --privileged=true -v /dev/:/dev/ -e OSD_DEVICE=/dev/vdb docker.io/ceph/daemon:tag-build-master-jewel-centos-7 zap_device
docker start ceph-osd-prepare-overcloud-cephstorage-1-devdevvdb; docker logs -f ceph-osd-prepare-overcloud-cephstorage-1-devdevvdb

[root@overcloud-cephstorage-1 ~]# docker run -ti --entrypoint=bash docker.io/ceph/daemon:tag-build-master-jewel-centos-7
[root@e902236a00de /]#
[root@e902236a00de /]# rpm -qa | grep ceph
libcephfs1-10.2.9-0.el7.x86_64
python-cephfs-10.2.9-0.el7.x86_64
ceph-base-10.2.9-0.el7.x86_64
ceph-osd-10.2.9-0.el7.x86_64
ceph-radosgw-10.2.9-0.el7.x86_64
ceph-release-1-1.el7.noarch
ceph-common-10.2.9-0.el7.x86_64
ceph-selinux-10.2.9-0.el7.x86_64
ceph-mds-10.2.9-0.el7.x86_64
ceph-mon-10.2.9-0.el7.x86_64
[root@e902236a00de /]#


Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #21728: ceph-disk: retry on OSErrorResolvedKefu Chai10/09/2017

Actions
Actions #1

Updated by John Fulton over 6 years ago

Docker container info at:

[root@overcloud-cephstorage-1 ~]# docker inspect fb43a44da2f4 | curl F 'f:1=<' ix.io
http://ix.io/A41
[root@overcloud-cephstorage-1 ~]#

Actions #2

Updated by John Fulton over 6 years ago

Update:

- this only happens for ceph-disk in containers
- I no longer hit this race condition when using a container https://hub.docker.com/r/ceph/daemon/builds/bvqtdrttvmbajeepyjbnvm3/ which has the fix at https://github.com/ceph/ceph/pull/18162

Actions #4

Updated by Ken Dreyer over 6 years ago

  • Target version deleted (v10.2.10)

Thanks for confirming. In that case I will go ahead and mark this as a dup of the ticket where we tracked the backports to Jewel and Luminous.

Actions #5

Updated by Ken Dreyer over 6 years ago

  • Is duplicate of Bug #21728: ceph-disk: retry on OSError added
Actions #6

Updated by Ken Dreyer over 6 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF