Bug #16405
closed"ceph_disk.main.Error: Device /dev/sdb1 is in use by a device-mapper mapping (dm-crypt?)" / "Failed to create OSDs" during QA suite runs
0%
Description
2016-06-21T13:59:05.834 INFO:teuthology.orchestra.run.mira085.stderr:[mira085][WARNING] ceph_disk.main.Error: Error: Device /dev/sdb1 is in use by a device-mapper mapping (dm-crypt?): dm-1 2016-06-21T13:59:05.846 INFO:teuthology.orchestra.run.mira085.stderr:[mira085][ERROR ] RuntimeError: command returned non-zero exit status: 1 2016-06-21T13:59:05.847 INFO:teuthology.orchestra.run.mira085.stderr:[ceph_deploy.osd][ERROR ] Failed to execute command: /usr/sbin/ceph-disk -v prepare --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --cluster ceph --fs-type xfs -- /dev/sdb /dev/sdc
Various test runs where this problem appears in different tests each time (less concurrent tests seems to lead to less failures):
http://pulpito.ceph.com/oprypin-2016-06-16_08:41:28-ceph-deploy-jewel---basic-mira/
http://pulpito.ceph.com/oprypin-2016-06-17_10:12:39-ceph-deploy-jewel---basic-mira/
http://pulpito.ceph.com/oprypin-2016-06-20_15:51:37-ceph-deploy-jewel---basic-mira/
http://pulpito.ceph.com/oprypin-2016-06-21_13:09:06-ceph-deploy-jewel---basic-mira/
I do not exclude the possibility that the changes I'm introducing here cause this problem. Or maybe they just make it more likely to be triggered.
Updated by Josh Durgin almost 8 years ago
Jobs that hit this on existing runs:
I suspect Oleh's runs triggered this more often due to running more frequently on an empty queue, so more ceph-deploy jobs got run one after the other on the same nodes. It seems to be a race condition since it's happened intermittently with each combination of python 2, python 3, ubuntu, and centos.