Actions
Bug #1581
closedteuthology: restarting osds sometimes allows daemon-helper to fail
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
This causes the test to be marked failed. From teuthology:~teuthworker/archive/nightly_coverage_2011-09-26/652/teuthology.log:
2011-09-26T16:59:44.262 INFO:teuthology.task.thrashosds.thrasher:Killing osd 12, live_osds are [2, 8, 14, 4, 0, 13, 5, 3, 6, 11, 1, 10, 7, 9, 15, 12] 2011-09-26T16:59:44.265 INFO:teuthology.task.ceph.osd.12.err:*** Caught signal (Terminated) ** 2011-09-26T16:59:44.265 INFO:teuthology.task.ceph.osd.12.err: in thread 0x7f00f6be2720. Shutting down. 2011-09-26T16:59:45.435 INFO:teuthology.task.ceph.osd.12.err:daemon-helper: command crashed with signal 15 ... 2011-09-26T17:09:03.424 INFO:teuthology.task.workunit.client.0.out:iozone test complete. 2011-09-26T17:09:03.503 DEBUG:teuthology.orchestra.run:Running: 'rm -rf -- /tmp/cephtest/workunits.list /tmp/cephtest/workunit.client.0' 2011-09-26T17:09:03.558 DEBUG:teuthology.parallel:result is None 2011-09-26T17:09:03.559 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x1d89750> 2011-09-26T17:09:03.559 INFO:teuthology.task.cfuse:Unmounting ceph-fuse clients... 2011-09-26T17:09:03.559 DEBUG:teuthology.orchestra.run:Running: 'fusermount -u /tmp/cephtest/mnt.0' 2011-09-26T17:09:03.597 INFO:teuthology.task.cfuse.cfuse.0.err:ceph-fuse[2364]: fuse finished with error 0 2011-09-26T17:09:04.584 DEBUG:teuthology.orchestra.run:Running: 'rmdir -- /tmp/cephtest/mnt.0' 2011-09-26T17:09:04.601 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x1d891d0> 2011-09-26T17:09:04.601 INFO:teuthology.task.thrashosds:joining thrashosds 2011-09-26T17:09:04.601 ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x1d891d0> Traceback (most recent call last): File "/var/lib/teuthworker/teuthology/teuthology/run_tasks.py", line 43, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.6/contextlib.py", line 23, in __exit__ self.gen.next() File "/var/lib/teuthworker/teuthology/teuthology/task/thrashosds.py", line 79, in task thrash_proc.do_join() File "/var/lib/teuthworker/teuthology/teuthology/task/ceph_manager.py", line 78, in do_join self.get() File "/var/lib/teuthworker/teuthology/virtualenv/lib/python2.6/site-packages/gevent/greenlet.py", line 308, in get raise self._exception CommandFailedError: Command failed with status 1: '/tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/daemon-helper term /tmp/cephtest/binary/usr/local/bin/ceph-osd -f -i 12 -c /tmp/cephtest/ceph.conf'
From the osd log, it looks like osd 12 was just killed by ceph_manager.
Updated by Josh Durgin over 12 years ago
This happened again in teuthology:~teuthworker/archive/nightly_coverage_2011-10-03/41
2011-10-03T15:33:48.921 INFO:teuthology.task.thrashosds.thrasher:in_osds: [0, 1, 2, 3, 4, 7, 8, 9, 10, 12, 13, 14, 15, 5, 11, 6] out_osds: [] dead_osds: [] live_osds: [10, 13, 12, 15, 14, 1, 0 , 3, 2, 4, 7, 9, 8, 5, 11, 6] 2011-10-03T15:33:48.921 INFO:teuthology.task.thrashosds.thrasher:Killing osd 6, live_osds are [10, 13, 12, 15, 14, 1, 0, 3, 2, 4, 7, 9, 8, 5, 11, 6] 2011-10-03T15:33:48.923 INFO:teuthology.task.ceph.osd.6.err:*** Caught signal (Terminated) ** 2011-10-03T15:33:48.923 INFO:teuthology.task.ceph.osd.6.err: in thread 0x7fc0da0b8720. Shutting down. 2011-10-03T15:33:49.046 INFO:teuthology.task.ceph.osd.6.err:daemon-helper: command crashed with signal 15
Updated by Josh Durgin over 12 years ago
- Assignee set to Josh Durgin
Looking into this since it's happened again today.
Updated by Josh Durgin over 12 years ago
Probably fixed with 3d3ba1ebb1c9f145300e972829b73a7eeaf00faa. I'll close the issue if it doesn't recur in the next couple days.
Actions