Project

General

Profile

Actions

Bug #1581

closed

teuthology: restarting osds sometimes allows daemon-helper to fail

Added by Josh Durgin over 12 years ago. Updated over 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
teuthology
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This causes the test to be marked failed. From teuthology:~teuthworker/archive/nightly_coverage_2011-09-26/652/teuthology.log:

2011-09-26T16:59:44.262 INFO:teuthology.task.thrashosds.thrasher:Killing osd 12, live_osds are [2, 8, 14, 4, 0, 13, 5, 3, 6, 11, 1, 10, 7, 9, 15, 12]
2011-09-26T16:59:44.265 INFO:teuthology.task.ceph.osd.12.err:*** Caught signal (Terminated) **
2011-09-26T16:59:44.265 INFO:teuthology.task.ceph.osd.12.err: in thread 0x7f00f6be2720. Shutting down.
2011-09-26T16:59:45.435 INFO:teuthology.task.ceph.osd.12.err:daemon-helper: command crashed with signal 15
...
2011-09-26T17:09:03.424 INFO:teuthology.task.workunit.client.0.out:iozone test complete.
2011-09-26T17:09:03.503 DEBUG:teuthology.orchestra.run:Running: 'rm -rf -- /tmp/cephtest/workunits.list /tmp/cephtest/workunit.client.0'
2011-09-26T17:09:03.558 DEBUG:teuthology.parallel:result is None
2011-09-26T17:09:03.559 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x1d89750>
2011-09-26T17:09:03.559 INFO:teuthology.task.cfuse:Unmounting ceph-fuse clients...
2011-09-26T17:09:03.559 DEBUG:teuthology.orchestra.run:Running: 'fusermount -u /tmp/cephtest/mnt.0'
2011-09-26T17:09:03.597 INFO:teuthology.task.cfuse.cfuse.0.err:ceph-fuse[2364]: fuse finished with error 0
2011-09-26T17:09:04.584 DEBUG:teuthology.orchestra.run:Running: 'rmdir -- /tmp/cephtest/mnt.0'
2011-09-26T17:09:04.601 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x1d891d0>
2011-09-26T17:09:04.601 INFO:teuthology.task.thrashosds:joining thrashosds
2011-09-26T17:09:04.601 ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x1d891d0>
Traceback (most recent call last):
  File "/var/lib/teuthworker/teuthology/teuthology/run_tasks.py", line 43, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.6/contextlib.py", line 23, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/teuthology/teuthology/task/thrashosds.py", line 79, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/teuthology/teuthology/task/ceph_manager.py", line 78, in do_join
    self.get()
  File "/var/lib/teuthworker/teuthology/virtualenv/lib/python2.6/site-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
CommandFailedError: Command failed with status 1: '/tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/daemon-helper term /tmp/cephtest/binary/usr/local/bin/ceph-osd -f -i 12 -c /tmp/cephtest/ceph.conf'

From the osd log, it looks like osd 12 was just killed by ceph_manager.

Actions #1

Updated by Josh Durgin over 12 years ago

This happened again in teuthology:~teuthworker/archive/nightly_coverage_2011-10-03/41

2011-10-03T15:33:48.921 INFO:teuthology.task.thrashosds.thrasher:in_osds:  [0, 1, 2, 3, 4, 7, 8, 9, 10, 12, 13, 14, 15, 5, 11, 6]  out_osds:  [] dead_osds:  [] live_osds:  [10, 13, 12, 15, 14, 1, 0
, 3, 2, 4, 7, 9, 8, 5, 11, 6]
2011-10-03T15:33:48.921 INFO:teuthology.task.thrashosds.thrasher:Killing osd 6, live_osds are [10, 13, 12, 15, 14, 1, 0, 3, 2, 4, 7, 9, 8, 5, 11, 6]
2011-10-03T15:33:48.923 INFO:teuthology.task.ceph.osd.6.err:*** Caught signal (Terminated) **
2011-10-03T15:33:48.923 INFO:teuthology.task.ceph.osd.6.err: in thread 0x7fc0da0b8720. Shutting down.
2011-10-03T15:33:49.046 INFO:teuthology.task.ceph.osd.6.err:daemon-helper: command crashed with signal 15
Actions #2

Updated by Josh Durgin over 12 years ago

  • Assignee set to Josh Durgin

Looking into this since it's happened again today.

Actions #3

Updated by Josh Durgin over 12 years ago

Probably fixed with 3d3ba1ebb1c9f145300e972829b73a7eeaf00faa. I'll close the issue if it doesn't recur in the next couple days.

Actions #4

Updated by Josh Durgin over 12 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF