Actions
Bug #1744
closedteuthology: race with daemon shutdown?
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
INFO:teuthology.task.cfuse:Unmounting ceph-fuse clients... INFO:teuthology.task.cfuse.cfuse.0.err:ceph-fuse[24778]: fuse finished with error 0 INFO:teuthology.task.ceph:Shutting down mds daemons... INFO:teuthology.task.ceph.mds.0:mds.0: Stopped INFO:teuthology.task.ceph:Shutting down osd daemons... INFO:teuthology.task.ceph:Shutting down mon daemons... INFO:teuthology.task.ceph.mon.1:mon.1: Stopped INFO:teuthology.task.ceph.mon.0:mon.0: Stopped INFO:teuthology.task.ceph.mon.2:mon.2: Stopped INFO:teuthology.task.ceph:Checking cluster ceph.log for badness... INFO:teuthology.task.ceph:Cleaning ceph cluster... INFO:teuthology.orchestra.run.err:rm: cannot remove `/tmp/cephtest/data/osd.0.data/current/meta': Directory not empty INFO:teuthology.task.ceph:Removing ceph binaries... INFO:teuthology.task.ceph:Removing shipped files: daemon-helper enable-coredump... ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x2fa96d0> Traceback (most recent call last): File "/home/qa/teuthology/teuthology/run_tasks.py", line 45, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.6/contextlib.py", line 23, in __exit__ self.gen.next() File "/home/qa/teuthology/teuthology/task/ceph.py", line 1066, in task yield File "/usr/lib/python2.6/contextlib.py", line 23, in __exit__ self.gen.next() File "/home/qa/teuthology/teuthology/contextutil.py", line 35, in nested if exit(*exc): File "/usr/lib/python2.6/contextlib.py", line 34, in __exit__ self.gen.throw(type, value, traceback) File "/home/qa/teuthology/teuthology/task/ceph.py", line 718, in cluster wait=False, File "/home/qa/teuthology/teuthology/orchestra/run.py", line 272, in wait proc.exitstatus.get() File "/home/qa/teuthology/virtualenv/lib/python2.6/site-packages/gevent/event.py", line 223, in get raise self._exception CommandFailedError: Command failed with status 1: 'rm -rf -- /tmp/cephtest/ceph.conf /tmp/cephtest/ceph.keyring /tmp/cephtest/data /tmp/cephtest/monmap /tmp/cephtest/asok.*' INFO:teuthology.orchestra.run.out:kernel.core_pattern = core INFO:teuthology.orchestra.run.out:kernel.core_pattern = core INFO:teuthology.orchestra.run.out:kernel.core_pattern = core WARNING:teuthology.task.internal:Found coredumps on ubuntu@sepia69.ceph.dreamhost.com, flagging run as failed INFO:teuthology.task.internal:Removing archive directory... INFO:teuthology.task.internal:Tidying up after the test... INFO:teuthology.orchestra.run.err:rmdir: failed to remove `/tmp/cephtest': Directory not empty ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x2f08bd0> Traceback (most recent call last): File "/home/qa/teuthology/teuthology/run_tasks.py", line 45, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.6/contextlib.py", line 23, in __exit__ self.gen.next() File "/home/qa/teuthology/teuthology/task/internal.py", line 43, in base wait=False, File "/home/qa/teuthology/teuthology/orchestra/run.py", line 272, in wait proc.exitstatus.get() File "/home/qa/teuthology/virtualenv/lib/python2.6/site-packages/gevent/event.py", line 207, in get raise self._exception CommandFailedError: Command failed with status 1: 'rmdir -- /tmp/cephtest'
Note that osds should be stopped, but the rm -r failed due to a racing process creating files while it was running. /tmp/cephtest is left with
ubuntu@sepia68:/tmp/cephtest$ find . -ls 4194311 4 drwxr-xr-x 3 ubuntu ubuntu 4096 Nov 20 19:13 . 4194522 4 drwxr-xr-x 3 ubuntu ubuntu 4096 Nov 20 19:13 ./data 4194530 4 drwxr-xr-x 3 ubuntu ubuntu 4096 Nov 20 19:13 ./data/osd.0.data 4194542 4 drwxr-xr-x 3 ubuntu ubuntu 4096 Nov 20 19:13 ./data/osd.0.data/current 4194548 8 drwxr-xr-x 2 ubuntu ubuntu 4096 Nov 20 19:13 ./data/osd.0.data/current/meta 4194528 4 -rw-r--r-- 1 ubuntu ubuntu 8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.2__0_DAE19372 4194527 4 -rw-r--r-- 1 ubuntu ubuntu 8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.6__0_DAE19632 4194532 4 -rw-r--r-- 1 ubuntu ubuntu 8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.0__0_DAE19212 4194526 8 -rw-r--r-- 1 ubuntu ubuntu 61668 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.4__0_103B1A2E 4194524 4 -rw-r--r-- 1 ubuntu ubuntu 60696 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.1__0_103B043E 4194531 4 -rw-r--r-- 1 ubuntu ubuntu 68904 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.0__0_103B076E 4194523 4 -rw-r--r-- 1 ubuntu ubuntu 8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.1__0_DAE193A2 4194536 8 -rw-r--r-- 1 ubuntu ubuntu 74844 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.7__0_103B185E 4194533 4 -rw-r--r-- 1 ubuntu ubuntu 8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.5__0_DAE19162 4194534 4 -rw-r--r-- 1 ubuntu ubuntu 8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.3__0_DAE19002 4194537 4 -rw-r--r-- 1 ubuntu ubuntu 8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.7__0_DAE197C2 4194535 8 -rw-r--r-- 1 ubuntu ubuntu 74628 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.2__0_103B05CE 4194549 4 -rw-r--r-- 1 ubuntu ubuntu 59184 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.6__0_103B188E 4194541 4 -rw-r--r-- 1 ubuntu ubuntu 8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.4__0_DAE191D2 4194551 4 -rw-r--r-- 1 ubuntu ubuntu 61020 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.3__0_103B1A9E 4194553 4 -rw-r--r-- 1 ubuntu ubuntu 65016 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.5__0_103B1BFE
Is this because 'killall ceph-osd' was run but the process didn't actually stop immediately? Or some other reason? (Why don't we see a
INFO:teuthology.task.ceph.osd.0:osd.0: Stopped
line?)
Actions