Project

General

Profile

Actions

Bug #1744

closed

teuthology: race with daemon shutdown?

Added by Sage Weil over 12 years ago. Updated over 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
teuthology
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description


INFO:teuthology.task.cfuse:Unmounting ceph-fuse clients...
INFO:teuthology.task.cfuse.cfuse.0.err:ceph-fuse[24778]: fuse finished with error 0
INFO:teuthology.task.ceph:Shutting down mds daemons...
INFO:teuthology.task.ceph.mds.0:mds.0: Stopped
INFO:teuthology.task.ceph:Shutting down osd daemons...
INFO:teuthology.task.ceph:Shutting down mon daemons...
INFO:teuthology.task.ceph.mon.1:mon.1: Stopped
INFO:teuthology.task.ceph.mon.0:mon.0: Stopped
INFO:teuthology.task.ceph.mon.2:mon.2: Stopped
INFO:teuthology.task.ceph:Checking cluster ceph.log for badness...
INFO:teuthology.task.ceph:Cleaning ceph cluster...
INFO:teuthology.orchestra.run.err:rm: cannot remove `/tmp/cephtest/data/osd.0.data/current/meta': Directory not empty
INFO:teuthology.task.ceph:Removing ceph binaries...
INFO:teuthology.task.ceph:Removing shipped files: daemon-helper enable-coredump...
ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x2fa96d0>
Traceback (most recent call last):
  File "/home/qa/teuthology/teuthology/run_tasks.py", line 45, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.6/contextlib.py", line 23, in __exit__
    self.gen.next()
  File "/home/qa/teuthology/teuthology/task/ceph.py", line 1066, in task
    yield
  File "/usr/lib/python2.6/contextlib.py", line 23, in __exit__
    self.gen.next()
  File "/home/qa/teuthology/teuthology/contextutil.py", line 35, in nested
    if exit(*exc):
  File "/usr/lib/python2.6/contextlib.py", line 34, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/qa/teuthology/teuthology/task/ceph.py", line 718, in cluster
    wait=False,
  File "/home/qa/teuthology/teuthology/orchestra/run.py", line 272, in wait
    proc.exitstatus.get()
  File "/home/qa/teuthology/virtualenv/lib/python2.6/site-packages/gevent/event.py", line 223, in get
    raise self._exception
CommandFailedError: Command failed with status 1: 'rm -rf -- /tmp/cephtest/ceph.conf /tmp/cephtest/ceph.keyring /tmp/cephtest/data /tmp/cephtest/monmap /tmp/cephtest/asok.*'
INFO:teuthology.orchestra.run.out:kernel.core_pattern = core
INFO:teuthology.orchestra.run.out:kernel.core_pattern = core
INFO:teuthology.orchestra.run.out:kernel.core_pattern = core
WARNING:teuthology.task.internal:Found coredumps on ubuntu@sepia69.ceph.dreamhost.com, flagging run as failed
INFO:teuthology.task.internal:Removing archive directory...
INFO:teuthology.task.internal:Tidying up after the test...
INFO:teuthology.orchestra.run.err:rmdir: failed to remove `/tmp/cephtest': Directory not empty
ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x2f08bd0>
Traceback (most recent call last):
  File "/home/qa/teuthology/teuthology/run_tasks.py", line 45, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.6/contextlib.py", line 23, in __exit__
    self.gen.next()
  File "/home/qa/teuthology/teuthology/task/internal.py", line 43, in base
    wait=False,
  File "/home/qa/teuthology/teuthology/orchestra/run.py", line 272, in wait
    proc.exitstatus.get()
  File "/home/qa/teuthology/virtualenv/lib/python2.6/site-packages/gevent/event.py", line 207, in get
    raise self._exception
CommandFailedError: Command failed with status 1: 'rmdir -- /tmp/cephtest'

Note that osds should be stopped, but the rm -r failed due to a racing process creating files while it was running. /tmp/cephtest is left with

ubuntu@sepia68:/tmp/cephtest$ find . -ls
4194311    4 drwxr-xr-x   3 ubuntu   ubuntu       4096 Nov 20 19:13 .
4194522    4 drwxr-xr-x   3 ubuntu   ubuntu       4096 Nov 20 19:13 ./data
4194530    4 drwxr-xr-x   3 ubuntu   ubuntu       4096 Nov 20 19:13 ./data/osd.0.data
4194542    4 drwxr-xr-x   3 ubuntu   ubuntu       4096 Nov 20 19:13 ./data/osd.0.data/current
4194548    8 drwxr-xr-x   2 ubuntu   ubuntu       4096 Nov 20 19:13 ./data/osd.0.data/current/meta
4194528    4 -rw-r--r--   1 ubuntu   ubuntu          8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.2__0_DAE19372
4194527    4 -rw-r--r--   1 ubuntu   ubuntu          8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.6__0_DAE19632
4194532    4 -rw-r--r--   1 ubuntu   ubuntu          8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.0__0_DAE19212
4194526    8 -rw-r--r--   1 ubuntu   ubuntu      61668 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.4__0_103B1A2E
4194524    4 -rw-r--r--   1 ubuntu   ubuntu      60696 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.1__0_103B043E
4194531    4 -rw-r--r--   1 ubuntu   ubuntu      68904 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.0__0_103B076E
4194523    4 -rw-r--r--   1 ubuntu   ubuntu          8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.1__0_DAE193A2
4194536    8 -rw-r--r--   1 ubuntu   ubuntu      74844 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.7__0_103B185E
4194533    4 -rw-r--r--   1 ubuntu   ubuntu          8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.5__0_DAE19162
4194534    4 -rw-r--r--   1 ubuntu   ubuntu          8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.3__0_DAE19002
4194537    4 -rw-r--r--   1 ubuntu   ubuntu          8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.7__0_DAE197C2
4194535    8 -rw-r--r--   1 ubuntu   ubuntu      74628 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.2__0_103B05CE
4194549    4 -rw-r--r--   1 ubuntu   ubuntu      59184 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.6__0_103B188E
4194541    4 -rw-r--r--   1 ubuntu   ubuntu          8 Nov 20 19:13 ./data/osd.0.data/current/meta/pginfo\\u0.4__0_DAE191D2
4194551    4 -rw-r--r--   1 ubuntu   ubuntu      61020 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.3__0_103B1A9E
4194553    4 -rw-r--r--   1 ubuntu   ubuntu      65016 Nov 20 19:13 ./data/osd.0.data/current/meta/pglog\\u0.5__0_103B1BFE

Is this because 'killall ceph-osd' was run but the process didn't actually stop immediately? Or some other reason? (Why don't we see a

INFO:teuthology.task.ceph.osd.0:osd.0: Stopped

line?)


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #1997: teuthology: wait for clean osd shutdown before umountResolvedJosh Durgin01/27/2012

Actions
Actions

Also available in: Atom PDF