Project

General

Profile

Actions

Bug #36738

open

drives still have old bluestore signature after cm-ansible run

Added by Vasu Kulkarni over 5 years ago. Updated over 5 years ago.

Status:
New
Priority:
High
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

in ceph-deploy tests, couple of tests fail and it was not clear before what the issue was, running the same
tests in ovh has passed all the tests, so this confirms issue with ceph-deploy zap as well as cleanup with cm-ansible

using mira nodes where couple of jobs fail to bring up osd:http://pulpito.ceph.com/yuriw-2018-11-06_17:15:26-ceph-deploy-luminous-distro-basic-mira/

clean run: http://pulpito.ceph.com/vasu-2018-11-08_21:05:08-ceph-deploy-luminous-distro-basic-ovh/


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #37089: tests: ceph-admin-commands.sh workunit does not log what it's doingResolvedNathan Cutler11/13/2018

Actions
Actions #1

Updated by David Galloway over 5 years ago

What job in http://pulpito.ceph.com/yuriw-2018-11-06_17:15:26-ceph-deploy-luminous-distro-basic-mira/ are you referring to?

Also, the issue that Shylesh raised about ceph-cm-ansible not zapping disks turned out to be a bug in your teuthology fork. See https://tracker.ceph.com/issues/36736#change-124355

Actions #2

Updated by Vasu Kulkarni over 5 years ago

This is one of the test you can see that failed http://qa-proxy.ceph.com/teuthology/yuriw-2018-11-06_17:15:26-ceph-deploy-luminous-distro-basic-mira/3230597/teuthology.log , since they are all working in ovh(same build), its just the cleanup that causing these issues.

Actions #3

Updated by David Galloway over 5 years ago

2018-11-07T03:47:49.190 INFO:teuthology.orchestra.run.mira026:Running (workunit test ceph-tests/ceph-admin-commands.sh): 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=8157642b94a60dbfc3c88529a543a094d45d2b5e TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/ceph-tests/ceph-admin-commands.sh'
2018-11-07T03:47:50.014 INFO:tasks.workunit.client.0.mira026.stdout:  cluster:
2018-11-07T03:47:50.015 INFO:tasks.workunit.client.0.mira026.stdout:    id:     eb445a9b-6176-4c6f-819f-94468830f3e6
2018-11-07T03:47:50.015 INFO:tasks.workunit.client.0.mira026.stdout:    health: HEALTH_OK
2018-11-07T03:47:50.015 INFO:tasks.workunit.client.0.mira026.stdout:
2018-11-07T03:47:50.015 INFO:tasks.workunit.client.0.mira026.stdout:  services:
2018-11-07T03:47:50.016 INFO:tasks.workunit.client.0.mira026.stdout:    mon: 2 daemons, quorum mira010,mira026
2018-11-07T03:47:50.016 INFO:tasks.workunit.client.0.mira026.stdout:    mgr: mira010(active)
2018-11-07T03:47:50.016 INFO:tasks.workunit.client.0.mira026.stdout:    osd: 2 osds: 0 up, 0 in
2018-11-07T03:47:50.016 INFO:tasks.workunit.client.0.mira026.stdout:
2018-11-07T03:47:50.017 INFO:tasks.workunit.client.0.mira026.stdout:  data:
2018-11-07T03:47:50.017 INFO:tasks.workunit.client.0.mira026.stdout:    pools:   1 pools, 128 pgs
2018-11-07T03:47:50.017 INFO:tasks.workunit.client.0.mira026.stdout:    objects: 0 objects, 0B
2018-11-07T03:47:50.017 INFO:tasks.workunit.client.0.mira026.stdout:    usage:   0B used, 0B / 0B avail
2018-11-07T03:47:50.017 INFO:tasks.workunit.client.0.mira026.stdout:    pgs:     100.000% pgs unknown
2018-11-07T03:47:50.018 INFO:tasks.workunit.client.0.mira026.stdout:             128 unknown
2018-11-07T03:47:50.018 INFO:tasks.workunit.client.0.mira026.stdout:
2018-11-07T03:47:50.112 INFO:tasks.workunit.client.0.mira026.stdout:rbd
2018-11-07T06:47:49.324 DEBUG:teuthology.orchestra.run:got remote process result: 124
2018-11-07T06:47:49.340 INFO:tasks.workunit:Stopping ['ceph-tests/ceph-admin-commands.sh'] on client.0...
2018-11-07T06:47:49.341 INFO:teuthology.orchestra.run.mira026:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0'
2018-11-07T06:47:49.822 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 123, in task
    timeout=timeout, cleanup=cleanup)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 85, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 99, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 22, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph_luminous/qa/tasks/workunit.py", line 409, in _run_tests
    label="workunit test {workunit}".format(workunit=workunit)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 194, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 429, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed (workunit test ceph-tests/ceph-admin-commands.sh) on mira026 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=8157642b94a60dbfc3c88529a543a094d45d2b5e TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/ceph-tests/ceph-admin-commands.sh'

I see there was an error but I'm not able to decipher what it is. What's going on here?

Actions #4

Updated by Nathan Cutler over 5 years ago

Looks like the error code 124 is coming from "rbd ls":

smithfarm@wilbur:~/src/ceph/smithfarm/ceph> cat qa/workunits/ceph-tests/ceph-admin-commands.sh 
#!/bin/sh -e

#check ceph health
ceph -s
#list pools
rados lspools
#lisr rbd images
rbd ls
#check that the monitors work
ceph osd set nodown
ceph osd unset nodown

exit 0
Actions #5

Updated by Nathan Cutler over 5 years ago

  • Related to Bug #37089: tests: ceph-admin-commands.sh workunit does not log what it's doing added
Actions #6

Updated by Vasu Kulkarni over 5 years ago

Nathan,

The test has actually failed before this, since the OSD's are not up, the workunit is expected to fail

2018-11-07T03:47:50.016 INFO:tasks.workunit.client.0.mira026.stdout: osd: 2 osds: 0 up, 0 in

Actions #7

Updated by Nathan Cutler over 5 years ago

The test has actually failed before this, since the OSD's are not up, the workunit is expected to fail

2018-11-07T03:47:50.016 INFO:tasks.workunit.client.0.mira026.stdout: osd: 2 osds: 0 up, 0 in

Right - I just noticed that the workunit script was missing set -x. As you say, the root cause of the test failure is higher up in the log.

Actions

Also available in: Atom PDF