Bug #36738: drives still have old bluestore signature after cm-ansible run - ceph-cm-ansible - Ceph

Custom queries

Bug queue
Bug triage
Crash queue
Crash triage
Feedback
My issues
Need Review
Pending backports
Product Backlog Scrub

Actions

Copy link

Bug #36738

open

drives still have old bluestore signature after cm-ansible run

Added by Vasu Kulkarni over 5 years ago. Updated over 5 years ago.

Status:

New

Priority:

High

Assignee:

David Galloway

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

in ceph-deploy tests, couple of tests fail and it was not clear before what the issue was, running the same
tests in ovh has passed all the tests, so this confirms issue with ceph-deploy zap as well as cleanup with cm-ansible

using mira nodes where couple of jobs fail to bring up osd:http://pulpito.ceph.com/yuriw-2018-11-06_17:15:26-ceph-deploy-luminous-distro-basic-mira/

clean run: http://pulpito.ceph.com/vasu-2018-11-08_21:05:08-ceph-deploy-luminous-distro-basic-ovh/

Related issues 1 (0 open — 1 closed)

Related to Ceph - Bug #37089: tests: ceph-admin-commands.sh workunit does not log what it's doing

Resolved

Nathan Cutler

11/13/2018

Actions

Issue # Delay: days Cancel

History
Notes
Property changes

Actions

Copy link

Updated by David Galloway over 5 years ago

What job in http://pulpito.ceph.com/yuriw-2018-11-06_17:15:26-ceph-deploy-luminous-distro-basic-mira/ are you referring to?

Also, the issue that Shylesh raised about ceph-cm-ansible not zapping disks turned out to be a bug in your teuthology fork. See https://tracker.ceph.com/issues/36736#change-124355

Actions

Copy link

Updated by Vasu Kulkarni over 5 years ago

This is one of the test you can see that failed http://qa-proxy.ceph.com/teuthology/yuriw-2018-11-06_17:15:26-ceph-deploy-luminous-distro-basic-mira/3230597/teuthology.log , since they are all working in ovh(same build), its just the cleanup that causing these issues.

Actions

Copy link

Updated by David Galloway over 5 years ago

2018-11-07T03:47:49.190 INFO:teuthology.orchestra.run.mira026:Running (workunit test ceph-tests/ceph-admin-commands.sh): 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=8157642b94a60dbfc3c88529a543a094d45d2b5e TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/ceph-tests/ceph-admin-commands.sh'
2018-11-07T03:47:50.014 INFO:tasks.workunit.client.0.mira026.stdout:  cluster:
2018-11-07T03:47:50.015 INFO:tasks.workunit.client.0.mira026.stdout:    id:     eb445a9b-6176-4c6f-819f-94468830f3e6
2018-11-07T03:47:50.015 INFO:tasks.workunit.client.0.mira026.stdout:    health: HEALTH_OK
2018-11-07T03:47:50.015 INFO:tasks.workunit.client.0.mira026.stdout:
2018-11-07T03:47:50.015 INFO:tasks.workunit.client.0.mira026.stdout:  services:
2018-11-07T03:47:50.016 INFO:tasks.workunit.client.0.mira026.stdout:    mon: 2 daemons, quorum mira010,mira026
2018-11-07T03:47:50.016 INFO:tasks.workunit.client.0.mira026.stdout:    mgr: mira010(active)
2018-11-07T03:47:50.016 INFO:tasks.workunit.client.0.mira026.stdout:    osd: 2 osds: 0 up, 0 in
2018-11-07T03:47:50.016 INFO:tasks.workunit.client.0.mira026.stdout:
2018-11-07T03:47:50.017 INFO:tasks.workunit.client.0.mira026.stdout:  data:
2018-11-07T03:47:50.017 INFO:tasks.workunit.client.0.mira026.stdout:    pools:   1 pools, 128 pgs
2018-11-07T03:47:50.017 INFO:tasks.workunit.client.0.mira026.stdout:    objects: 0 objects, 0B
2018-11-07T03:47:50.017 INFO:tasks.workunit.client.0.mira026.stdout:    usage:   0B used, 0B / 0B avail
2018-11-07T03:47:50.017 INFO:tasks.workunit.client.0.mira026.stdout:    pgs:     100.000% pgs unknown
2018-11-07T03:47:50.018 INFO:tasks.workunit.client.0.mira026.stdout:             128 unknown
2018-11-07T03:47:50.018 INFO:tasks.workunit.client.0.mira026.stdout:
2018-11-07T03:47:50.112 INFO:tasks.workunit.client.0.mira026.stdout:rbd
2018-11-07T06:47:49.324 DEBUG:teuthology.orchestra.run:got remote process result: 124
2018-11-07T06:47:49.340 INFO:tasks.workunit:Stopping ['ceph-tests/ceph-admin-commands.sh'] on client.0...
2018-11-07T06:47:49.341 INFO:teuthology.orchestra.run.mira026:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0'
2018-11-07T06:47:49.822 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 123, in task
    timeout=timeout, cleanup=cleanup)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 85, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 99, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 22, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 409, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 194, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 429, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed (workunit test ceph-tests/ceph-admin-commands.sh) on mira026 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=8157642b94a60dbfc3c88529a543a094d45d2b5e TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/ceph-tests/ceph-admin-commands.sh'

I see there was an error but I'm not able to decipher what it is. What's going on here?

Actions

Copy link

Updated by Nathan Cutler over 5 years ago

Looks like the error code 124 is coming from "rbd ls":

smithfarm@wilbur:~/src/ceph/smithfarm/ceph> cat qa/workunits/ceph-tests/ceph-admin-commands.sh 
#!/bin/sh -e

#check ceph health
ceph -s
#list pools
rados lspools
#lisr rbd images
rbd ls
#check that the monitors work
ceph osd set nodown
ceph osd unset nodown

exit 0

Actions

Copy link

Updated by Nathan Cutler over 5 years ago

Related to Bug #37089: tests: ceph-admin-commands.sh workunit does not log what it's doing added

Actions

Copy link

Updated by Vasu Kulkarni over 5 years ago

Nathan,

The test has actually failed before this, since the OSD's are not up, the workunit is expected to fail

2018-11-07T03:47:50.016 INFO:tasks.workunit.client.0.mira026.stdout: osd: 2 osds: 0 up, 0 in

Actions

Copy link

Updated by Nathan Cutler over 5 years ago

The test has actually failed before this, since the OSD's are not up, the workunit is expected to fail

2018-11-07T03:47:50.016 INFO:tasks.workunit.client.0.mira026.stdout: osd: 2 osds: 0 up, 0 in

Right - I just noticed that the workunit script was missing set -x. As you say, the root cause of the test failure is higher up in the log.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Tools » ceph-cm-ansible

Custom queries

Bug #36738

drives still have old bluestore signature after cm-ansible run

Updated by David Galloway over 5 years ago

Updated by Vasu Kulkarni over 5 years ago

Updated by David Galloway over 5 years ago

Updated by Nathan Cutler over 5 years ago

Updated by Nathan Cutler over 5 years ago

Updated by Vasu Kulkarni over 5 years ago

Updated by Nathan Cutler over 5 years ago