Project

General

Profile

Actions

Bug #16405

closed

"ceph_disk.main.Error: Device /dev/sdb1 is in use by a device-mapper mapping (dm-crypt?)" / "Failed to create OSDs" during QA suite runs

Added by Oleh Prypin almost 8 years ago. Updated almost 3 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-deploy, ceph-disk
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2016-06-21T13:59:05.834 INFO:teuthology.orchestra.run.mira085.stderr:[mira085][WARNING] ceph_disk.main.Error: Error: Device /dev/sdb1 is in use by a device-mapper mapping (dm-crypt?): dm-1
2016-06-21T13:59:05.846 INFO:teuthology.orchestra.run.mira085.stderr:[mira085][ERROR ] RuntimeError: command returned non-zero exit status: 1
2016-06-21T13:59:05.847 INFO:teuthology.orchestra.run.mira085.stderr:[ceph_deploy.osd][ERROR ] Failed to execute command: /usr/sbin/ceph-disk -v prepare --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --cluster ceph --fs-type xfs -- /dev/sdb /dev/sdc

Full log: http://qa-proxy.ceph.com/teuthology/oprypin-2016-06-21_13:09:06-ceph-deploy-jewel---basic-mira/269093/teuthology.log

Various test runs where this problem appears in different tests each time (less concurrent tests seems to lead to less failures):

http://pulpito.ceph.com/oprypin-2016-06-16_08:41:28-ceph-deploy-jewel---basic-mira/
http://pulpito.ceph.com/oprypin-2016-06-17_10:12:39-ceph-deploy-jewel---basic-mira/
http://pulpito.ceph.com/oprypin-2016-06-20_15:51:37-ceph-deploy-jewel---basic-mira/
http://pulpito.ceph.com/oprypin-2016-06-21_13:09:06-ceph-deploy-jewel---basic-mira/

I do not exclude the possibility that the changes I'm introducing here cause this problem. Or maybe they just make it more likely to be triggered.

Actions #1

Updated by Josh Durgin almost 8 years ago

Jobs that hit this on existing runs:

http://qa-proxy.ceph.com/teuthology/teuthology-2016-06-08_02:50:02-ceph-deploy-jewel-distro-basic-mira/246135/teuthology.log

http://qa-proxy.ceph.com/teuthology/teuthology-2016-06-15_02:50:02-ceph-deploy-jewel-distro-basic-mira/260769/teuthology.log

I suspect Oleh's runs triggered this more often due to running more frequently on an empty queue, so more ceph-deploy jobs got run one after the other on the same nodes. It seems to be a race condition since it's happened intermittently with each combination of python 2, python 3, ubuntu, and centos.

Actions #2

Updated by Sage Weil almost 3 years ago

  • Status changed from New to Won't Fix
Actions

Also available in: Atom PDF