Actions
Bug #10484
closedunable to revive osd with valgrind
Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
It looks like the process is kill 9 while trying to revive the OSD: see
2015-01-07T18:37:25.003 INFO:tasks.ceph.osd.1.plana14.stderr:daemon-helper: command crashed with signal 9
in the log below:
2015-01-07T18:35:06.072 INFO:tasks.thrashosds.thrasher:Reviving osd 1 2015-01-07T18:35:06.072 INFO:tasks.ceph.osd.1:Restarting daemon 2015-01-07T18:35:06.073 INFO:teuthology.orchestra.run.plana14:Running: 'cd /home/ubuntu/cephtest && sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper term valgrind --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.1.log --time-stamp=yes --tool=memcheck ceph-osd -f -i 1' 2015-01-07T18:35:06.076 INFO:tasks.ceph.osd.1:Started 2015-01-07T18:35:06.076 INFO:teuthology.orchestra.run.plana14:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight' 2015-01-07T18:35:06.191 INFO:teuthology.orchestra.run.plana14.stderr:admin_socket: exception getting command descriptions: [Errno 2] No such file or directory 2015-01-07T18:35:06.201 INFO:tasks.thrashosds.ceph_manager:waiting on admin_socket for 1, ['dump_ops_in_flight'] 2015-01-07T18:35:08.972 INFO:tasks.ceph.osd.1.plana14.stdout:starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal 2015-01-07T18:35:09.815 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:03.675 16256-- WARNING: unhandled syscall: 306 2015-01-07T18:35:09.816 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:03.675 16256-- You may be able to write your own handler. 2015-01-07T18:35:09.816 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:03.675 16256-- Read the file README_MISSING_SYSCALL_OR_IOCTL. 2015-01-07T18:35:09.816 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:03.675 16256-- Nevertheless we consider this a bug. Please report 2015-01-07T18:35:09.816 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:03.675 16256-- it at http://valgrind.org/support/bug_reports.html. 2015-01-07T18:35:11.202 INFO:teuthology.orchestra.run.plana14:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight' 2015-01-07T18:35:11.858 INFO:teuthology.orchestra.run.plana14.stderr:no valid command found; 10 closest matches: 2015-01-07T18:35:11.859 INFO:teuthology.orchestra.run.plana14.stderr:config show 2015-01-07T18:35:11.859 INFO:teuthology.orchestra.run.plana14.stderr:help 2015-01-07T18:35:11.859 INFO:teuthology.orchestra.run.plana14.stderr:log dump 2015-01-07T18:35:11.860 INFO:teuthology.orchestra.run.plana14.stderr:get_command_descriptions 2015-01-07T18:35:11.860 INFO:teuthology.orchestra.run.plana14.stderr:git_version 2015-01-07T18:35:11.860 INFO:teuthology.orchestra.run.plana14.stderr:config set <var> <val> [<val>...] 2015-01-07T18:35:11.860 INFO:teuthology.orchestra.run.plana14.stderr:version 2015-01-07T18:35:11.861 INFO:teuthology.orchestra.run.plana14.stderr:2 2015-01-07T18:35:11.861 INFO:teuthology.orchestra.run.plana14.stderr:config get <var> 2015-01-07T18:35:11.861 INFO:teuthology.orchestra.run.plana14.stderr:0 2015-01-07T18:35:11.862 INFO:teuthology.orchestra.run.plana14.stderr:admin_socket: invalid command 2015-01-07T18:35:11.866 INFO:tasks.thrashosds.ceph_manager:waiting on admin_socket for 1, ['dump_ops_in_flight'] 2015-01-07T18:35:11.977 INFO:tasks.ceph.osd.1.plana14.stderr:2015-01-07 18:35:11.897432 403a840 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2015-01-07T18:35:13.043 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:06.902 16256-- WARNING: unhandled syscall: 306 2015-01-07T18:35:13.043 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:06.902 16256-- You may be able to write your own handler. 2015-01-07T18:35:13.044 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:06.902 16256-- Read the file README_MISSING_SYSCALL_OR_IOCTL. 2015-01-07T18:35:13.044 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:06.902 16256-- Nevertheless we consider this a bug. Please report 2015-01-07T18:35:13.044 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:06.902 16256-- it at http://valgrind.org/support/bug_reports.html. 2015-01-07T18:35:16.866 INFO:teuthology.orchestra.run.plana14:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight' 2015-01-07T18:35:17.266 INFO:teuthology.orchestra.run.plana14.stderr:no valid command found; 10 closest matches: 2015-01-07T18:35:17.266 INFO:teuthology.orchestra.run.plana14.stderr:config show 2015-01-07T18:35:17.267 INFO:teuthology.orchestra.run.plana14.stderr:help 2015-01-07T18:35:17.267 INFO:teuthology.orchestra.run.plana14.stderr:log dump 2015-01-07T18:35:17.267 INFO:teuthology.orchestra.run.plana14.stderr:get_command_descriptions 2015-01-07T18:35:17.267 INFO:teuthology.orchestra.run.plana14.stderr:git_version 2015-01-07T18:35:17.267 INFO:teuthology.orchestra.run.plana14.stderr:config set <var> <val> [<val>...] 2015-01-07T18:35:17.267 INFO:teuthology.orchestra.run.plana14.stderr:version 2015-01-07T18:35:17.268 INFO:teuthology.orchestra.run.plana14.stderr:2 2015-01-07T18:35:17.268 INFO:teuthology.orchestra.run.plana14.stderr:config get <var> 2015-01-07T18:35:17.268 INFO:teuthology.orchestra.run.plana14.stderr:0 2015-01-07T18:35:17.268 INFO:teuthology.orchestra.run.plana14.stderr:admin_socket: invalid command 2015-01-07T18:35:17.273 INFO:tasks.thrashosds.ceph_manager:waiting on admin_socket for 1, ['dump_ops_in_flight'] 2015-01-07T18:35:18.641 INFO:tasks.workunit.client.0.plana14.stdout:promoting some heads 2015-01-07T18:35:22.274 INFO:teuthology.orchestra.run.plana14:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight' 2015-01-07T18:35:27.778 INFO:tasks.thrashosds.thrasher:in_osds: [1, 5, 3, 0, 2, 4] out_osds: [] dead_osds: [] live_osds: [2, 5, 4, 3, 0, 1] 2015-01-07T18:35:27.779 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0 2015-01-07T18:35:27.779 INFO:tasks.thrashosds.thrasher:inject_pause on 5 2015-01-07T18:35:27.779 INFO:tasks.thrashosds.thrasher:Testing filestore_inject_stall pause injection for duration 3 2015-01-07T18:35:27.779 INFO:tasks.thrashosds.thrasher:Checking after 0, should_be_down=False 2015-01-07T18:35:27.779 INFO:teuthology.orchestra.run.plana27:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.5.asok config set filestore_inject_stall 3' 2015-01-07T18:35:32.923 INFO:tasks.thrashosds.thrasher:in_osds: [1, 5, 3, 0, 2, 4] out_osds: [] dead_osds: [] live_osds: [2, 5, 4, 3, 0, 1] 2015-01-07T18:35:32.923 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0 2015-01-07T18:35:32.923 INFO:tasks.thrashosds.thrasher:Killing osd 1, live_osds are [2, 5, 4, 3, 0, 1] 2015-01-07T18:35:47.788 INFO:tasks.workunit.client.0.plana14.stdout:promoting from clones for snap 6 2015-01-07T18:35:49.474 INFO:tasks.workunit.client.0.plana14.stdout:promoting from clones for snap 5 2015-01-07T18:36:10.981 INFO:tasks.workunit.client.0.plana14.stdout:promoting from clones for snap 4 2015-01-07T18:36:24.823 INFO:tasks.workunit.client.0.plana14.stdout:promoting from clones for snap 3 2015-01-07T18:36:40.278 INFO:tasks.workunit.client.0.plana14.stdout:waiting for scrubs... 2015-01-07T18:37:10.278 INFO:tasks.workunit.client.0.plana14.stdout:done waiting 2015-01-07T18:37:14.789 INFO:tasks.workunit.client.0.plana14.stdout:[ OK ] LibRadosTwoPoolsPP.PromoteSnapScrub (286847 ms) 2015-01-07T18:37:14.790 INFO:tasks.workunit.client.0.plana14.stdout:[ RUN ] LibRadosTwoPoolsPP.PromoteSnapTrimRace 2015-01-07T18:37:25.003 INFO:tasks.ceph.osd.1.plana14.stderr:daemon-helper: command crashed with signal 9 2015-01-07T18:37:29.354 INFO:tasks.workunit.client.0.plana14.stdout:[ OK ] LibRadosTwoPoolsPP.PromoteSnapTrimRace (14564 ms) ... 2015-01-07T18:53:06.151 INFO:tasks.workunit.client.0.plana14.stderr:+ exit 0 2015-01-07T18:53:06.151 INFO:teuthology.orchestra.run.plana14:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0/tmp' 2015-01-07T18:53:06.177 INFO:tasks.workunit:Stopping rados/test.sh on client.0... 2015-01-07T18:53:06.178 INFO:teuthology.orchestra.run.plana14:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.0' 2015-01-07T18:53:06.252 INFO:teuthology.orchestra.run.plana14:Running: 'rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0' 2015-01-07T18:53:06.322 INFO:tasks.workunit:Deleted dir /home/ubuntu/cephtest/mnt.0/client.0 2015-01-07T18:53:06.323 INFO:teuthology.orchestra.run.plana14:Running: 'rmdir -- /home/ubuntu/cephtest/mnt.0' 2015-01-07T18:53:06.421 INFO:tasks.workunit:Deleted dir /home/ubuntu/cephtest/mnt.0 2015-01-07T18:53:06.422 INFO:tasks.thrashosds:joining thrashosds 2015-01-07T18:53:06.422 ERROR:teuthology.run_tasks:Manager failed: thrashosds Traceback (most recent call last): File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 119, in run_tasks suppress = manager.__exit__(*exc_info) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/thrashosds.py", line 180, in task thrash_proc.do_join() File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph_manager.py", line 165, in do_join self.thread.get() File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get raise self._exception CommandFailedError: Command failed on plana14 with status 1: 'cd /home/ubuntu/cephtest && sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper term valgrind --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.1.log --time-stamp=yes --tool=memcheck ceph-osd -f -i 1'
Updated by Loïc Dachary over 9 years ago
- Status changed from New to Won't Fix
if its kill -9 it sounds like daemon-helper, which would be triggered by a network blip killing the connections from tetuhology to the daemon. i.e. infrastructur enoise
Actions