Project

General

Profile

Bug #10270

"[ FAILED ] LibRBD.ListChildren" in upgrade:firefly-x-giant-distro-basic-multi run

Added by Yuri Weinstein over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
giant,firefly,dumpling
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-12-07_18:13:01-upgrade:firefly-x-giant-distro-basic-multi/641068/

2014-12-07T18:34:16.978 INFO:tasks.rados.rados.0.plana54.stdout:593:  finishing copy_from to plana5413911-17
2014-12-07T18:34:16.978 INFO:tasks.rados.rados.0.plana54.stdout:update_object_version oid 17 v 247 (ObjNum 123 snap 28 seq_num 123) dirty exists
2014-12-07T18:34:16.984 INFO:tasks.rados.rados.0.plana54.stdout:596:  expect (ObjNum 155 snap 37 seq_num 155)
2014-12-07T18:34:17.161 INFO:tasks.workunit.client.3.plana54.stdout:[       OK ] LibRBD.ZeroLengthRead (3557 ms)
2014-12-07T18:34:17.161 INFO:tasks.workunit.client.3.plana54.stdout:[----------] 26 tests from LibRBD (242947 ms total)
2014-12-07T18:34:17.161 INFO:tasks.workunit.client.3.plana54.stdout:
2014-12-07T18:34:17.161 INFO:tasks.workunit.client.3.plana54.stdout:[----------] Global test environment tear-down
2014-12-07T18:34:17.162 INFO:tasks.workunit.client.3.plana54.stdout:[==========] 26 tests from 1 test case ran. (242947 ms total)
2014-12-07T18:34:17.162 INFO:tasks.workunit.client.3.plana54.stdout:[  PASSED  ] 25 tests.
2014-12-07T18:34:17.162 INFO:tasks.workunit.client.3.plana54.stdout:[  FAILED  ] 1 test, listed below:
2014-12-07T18:34:17.162 INFO:tasks.workunit.client.3.plana54.stdout:[  FAILED  ] LibRBD.ListChildren
2014-12-07T18:34:17.162 INFO:tasks.workunit.client.3.plana54.stdout:
2014-12-07T18:34:17.163 INFO:tasks.workunit.client.3.plana54.stdout: 1 FAILED TEST
2014-12-07T18:34:17.163 INFO:tasks.workunit:Stopping ['rbd/test_librbd.sh'] on client.3...
2014-12-07T18:34:17.164 INFO:teuthology.orchestra.run.plana54:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.3'
2014-12-07T18:34:17.237 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/workunit.py", line 359, in _run_tests
    args=args,
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 128, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 368, in run
    r.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 106, in wait
    exitstatus=status, node=self.hostname)
CommandFailedError: Command failed on plana54 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.3/client.3/tmp && cd -- /home/ubuntu/cephtest/mnt.3/client.3/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=firefly TESTDIR="/home/ubuntu/cephtest" CEPH_ID="3" adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.3/rbd/test_librbd.sh'
2014-12-07T18:34:17.262 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 50, in _run_spawned
    mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 41, in run_one_task
    return fn(**kwargs)
  File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/workunit.py", line 105, in task
    config.get('env'), timeout=timeout)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/workunit.py", line 359, in _run_tests
    args=args,
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/remote.py", line 128, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 368, in run
    r.wait()
  File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 106, in wait
    exitstatus=status, node=self.hostname)
CommandFailedError: Command failed on plana54 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.3/client.3/tmp && cd -- /home/ubuntu/cephtest/mnt.3/client.3/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=firefly TESTDIR="/home/ubuntu/cephtest" CEPH_ID="3" adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.3/rbd/test_librbd.sh'
2014-12-07T18:34:17.263 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 98, in next
    result = self.results.get()
  File "/usr/lib/python2.7/dist-packages/gevent/queue.py", line 190, in get
    return waiter.get()
  File "/usr/lib/python2.7/dist-packages/gevent/hub.py", line 321, in get
    return get_hub().switch()
  File "/usr/lib/python2.7/dist-packages/gevent/hub.py", line 164, in switch
    return greenlet.switch(self)
GreenletExit
2014-12-07T18:34:17.264 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 98, in next
    result = self.results.get()
  File "/usr/lib/python2.7/dist-packages/gevent/queue.py", line 190, in get
    return waiter.get()
  File "/usr/lib/python2.7/dist-packages/gevent/hub.py", line 321, in get
    return get_hub().switch()
  File "/usr/lib/python2.7/dist-packages/gevent/hub.py", line 164, in switch
    return greenlet.switch(self)
GreenletExit
2014-12-07T18:34:17.264 INFO:tasks.workunit:Stopping ['rbd/test_librbd_python.sh'] on client.4...
2014-12-07T18:34:17.265 INFO:teuthology.orchestra.run.plana54:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.4'
2014-12-07T18:34:17.266 INFO:tasks.workunit:Stopping ['rados/load-gen-big.sh'] on client.2...
2014-12-07T18:34:17.266 INFO:teuthology.orchestra.run.plana54:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.2'
2014-12-07T18:34:17.288 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 50, in _run_spawned
    mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 41, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 43, in task
    p.spawn(_run_spawned, ctx, confg, taskname)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 89, in __exit__
    raise
CommandFailedError: Command failed on plana54 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.3/client.3/tmp && cd -- /home/ubuntu/cephtest/mnt.3/client.3/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=firefly TESTDIR="/home/ubuntu/cephtest" CEPH_ID="3" adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.3/rbd/test_librbd.sh'
2014-12-07T18:34:17.289 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 53, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 41, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_master/teuthology/task/parallel.py", line 43, in task
    p.spawn(_run_spawned, ctx, confg, taskname)
  File "/home/teuthworker/src/teuthology_master/teuthology/parallel.py", line 89, in __exit__
    raise
CommandFailedError: Command failed on plana54 with status 1: 'mkdir -p -- /home/ubuntu/cephtest/mnt.3/client.3/tmp && cd -- /home/ubuntu/cephtest/mnt.3/client.3/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=firefly TESTDIR="/home/ubuntu/cephtest" CEPH_ID="3" adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.3/rbd/test_librbd.sh'
2014-12-07T18:34:17.290 DEBUG:teuthology.run_tasks:Unwinding manager ceph
2014-12-07T18:34:17.290 INFO:teuthology.orchestra.run.burnupi31:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph pg dump --format json'
2014-12-07T18:34:17.461 INFO:teuthology.orchestra.run.burnupi31.stderr:dumped all in format json
2014-12-07T18:34:17.476 INFO:tasks.ceph:Waiting for all osds to be active and clean.

Related issues

Related to rbd - Bug #10806: "[ FAILED ] LibRBD.TestClone" in upgrade:dumpling-x-firefly-distro-basic-vps run Duplicate 02/09/2015

Associated revisions

Revision 53929ba1 (diff)
Added by Jason Dillaman over 9 years ago

librbd: gracefully handle deleted/renamed pools

snap_unprotect and list_children both attempt to scan all
pools. If a pool is deleted or renamed during the scan,
the methods would previously return -ENOENT. Both methods
have been modified to more gracefully handle this condition.

Fixes: #10270
Backport: giant, firefly
Signed-off-by: Jason Dillaman <>

Revision 436923c6 (diff)
Added by Jason Dillaman about 9 years ago

librbd: gracefully handle deleted/renamed pools

snap_unprotect and list_children both attempt to scan all
pools. If a pool is deleted or renamed during the scan,
the methods would previously return -ENOENT. Both methods
have been modified to more gracefully handle this condition.

Fixes: #10270
Backport: giant, firefly
Signed-off-by: Jason Dillaman <>

Revision c23e42e7 (diff)
Added by Jason Dillaman about 9 years ago

librbd: gracefully handle deleted/renamed pools

snap_unprotect and list_children both attempt to scan all
pools. If a pool is deleted or renamed during the scan,
the methods would previously return -ENOENT. Both methods
have been modified to more gracefully handle this condition.

Fixes: #10270
Backport: giant, firefly
Signed-off-by: Jason Dillaman <>
(cherry picked from commit 436923c68b77c900b7774fbef918c0d6e1614a36)

Revision e1c38bd5 (diff)
Added by Jason Dillaman about 9 years ago

librbd: gracefully handle deleted/renamed pools

snap_unprotect and list_children both attempt to scan all
pools. If a pool is deleted or renamed during the scan,
the methods would previously return -ENOENT. Both methods
have been modified to more gracefully handle this condition.

Fixes: #10270, #10122
Backport: giant, firefly
Signed-off-by: Jason Dillaman <>
(cherry picked from commit 436923c68b77c900b7774fbef918c0d6e1614a36)

History

#1 Updated by Sage Weil over 9 years ago

  • Project changed from Ceph to rbd
  • Priority changed from Normal to Urgent

#2 Updated by Yuri Weinstein over 9 years ago

Same issue in run http://pulpito.front.sepia.ceph.com/teuthology-2014-12-09_13:52:17-upgrade:firefly-x-giant-distro-basic-vps/
Jobs ['645326', '645329', '645331']

2014-12-09T14:09:25.852 INFO:tasks.workunit.client.3.vpm014.stdout:[  FAILED  ] 1 test, listed below:
2014-12-09T14:09:25.853 INFO:tasks.workunit.client.3.vpm014.stdout:[  FAILED  ] LibRBD.ListChildren
2014-12-09T14:09:25.853 INFO:tasks.workunit.client.3.vpm014.stdout:
2014-12-09T14:09:25.853 INFO:tasks.workunit.client.3.vpm014.stdout: 1 FAILED TEST

#3 Updated by Jason Dillaman over 9 years ago

  • Assignee set to Jason Dillaman

#4 Updated by Jason Dillaman over 9 years ago

The 'LibRBD.ListChildren' test failed because other tests running in the background (cls_rgw and cls_rbd) deleted the temporary pool they created while rbd_list_children was attempting to iterate through all available pools.

#5 Updated by Jason Dillaman over 9 years ago

  • Status changed from New to Fix Under Review

#6 Updated by Jason Dillaman over 9 years ago

  • Backport set to giant,firefly

#10 Updated by Josh Durgin about 9 years ago

  • Status changed from Fix Under Review to Pending Backport

commit:53929ba1751fad9c9cd8545c4cd6985982d2eb5f

#12 Updated by Loïc Dachary about 9 years ago

<loicd> jdillaman: regarding http://tracker.ceph.com/issues/10270 do you have a backport somewhere already ? I tried to  cherry-pick -x commit:53929ba1751fad9c9cd8545c4cd6985982d2eb5f but it's non trivial.
<loicd> I mean for giant :-)
<jdillaman> loicd: i think the goal was to do something along the lines of revision ec5d8c7a for the backports
<jdillaman> loicd: but instead of skipping pools it has already checked on a retry, it should rescan all pools
<jdillaman> loicd: whoops — meant revision commit:c94f1aae

#15 Updated by Loïc Dachary about 9 years ago

  • Backport changed from giant,firefly to giant,firefly,dumpling

#18 Updated by Loïc Dachary about 9 years ago

  • Status changed from Pending Backport to Resolved

#19 Updated by Yuri Weinstein about 9 years ago

Run: http://pulpito.ceph.com/teuthology-2015-02-20_18:13:01-upgrade:firefly-x-giant-distro-basic-multi/
Job: 771823
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-02-20_18:13:01-upgrade:firefly-x-giant-distro-basic-multi/771823/teuthology.log

2015-02-21T01:23:33.614 INFO:tasks.rados.rados.0.plana78.stdout:611: write oid 381 current snap is 12
2015-02-21T01:23:33.614 INFO:tasks.rados.rados.0.plana78.stdout:611:  seq_num 528 ranges {730457=772628,2121222=209235}
2015-02-21T01:23:33.614 INFO:tasks.workunit.client.3.plana78.stdout:[  FAILED  ] LibRBD.ListChildren (10918 ms)

#20 Updated by Yuri Weinstein about 9 years ago

  • Status changed from Resolved to New

#21 Updated by Jason Dillaman about 9 years ago

  • Status changed from New to Pending Backport

Still awaiting backport to Firefly: https://github.com/ceph/ceph/pull/3404

#23 Updated by Josh Durgin about 9 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF