Project

General

Profile

Actions

Bug #13370

closed

ceph-disk: ceph-deploy suite fails with dmcrypt (hammer)

Added by Loïc Dachary over 8 years ago. Updated over 8 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Workaround

Run ceph-disk activate manually.

Rationale for Won't fix

The problem revealed by this suite is a race condition that can be fixed easily manually. It is fixed in infernalis, together with a number of fixes related to udev related fixes. It would not be trivial to backport on hammer and it looks like people usually work around this.

Description

curl --silent http://paddles.front.sepia.ceph.com/runs/?suite=ceph-deploy | jq '.[] | .name' | while read run ; do eval run=$run ; curl --silent http://paddles.front.sepia.ceph.com/runs/$run/jobs/ | jq '.[] | select(.os_version == "6.5" and .status == "pass") | select(.description | contains("dmcrypt")) | .name' ; done 

returns nothing and
teuthology-suite --verbose --suite ceph-deploy --filter="ceph-deploy/basic/{ceph-deploy-overrides/ceph_deploy_dmcrypt.yaml config_options/cephdeploy_conf.yaml distros/centos_6.5.yaml tasks/ceph-deploy_hello_world.yaml}" --suite-branch wip-ceph-deploy-test-hammer --ceph hammer-backports --machine-type vps --priority 101

fails consistently (see http://pulpito.ceph.com/loic-2015-10-05_01:40:39-ceph-deploy-hammer-backports---basic-vps/1088457/ for an example).

I suspect this is a combination of incorrect udev rules and racing udev events generated by ceph-disk.

teuthology-suite --verbose --suite ceph-deploy --filter="ceph-deploy/basic/{ceph-deploy-overrides/ceph_deploy_dmcrypt.yaml config_options/cephdeploy_conf.yaml distros/centos_6.5.yaml tasks/ceph-deploy_hello_world.yaml}" --suite-branch wip-ceph-deploy-test-hammer --ceph hammer-backports --machine-type vps --priority 101

will create a teuthology job that never returns (it waits forever for the cluster to be healthy) which is convenient to investigate.


Related issues 2 (0 open2 closed)

Has duplicate Ceph - Bug #13721: ceph-deploy: "unable to get 'HEALTH_OK' after waiting 15 minutes" in ceph-deploy-hammer-distro-basic-vps runDuplicate11/08/2015

Actions
Has duplicate Ceph - Bug #13366: ceph-deploy: "unable to get 'HEALTH_OK' after waiting 15 minutes" in ceph-deploy-hammer-distro-basic-vps runDuplicate10/05/2015

Actions
Actions

Also available in: Atom PDF