Project

General

Profile

Bug #10484

unable to revive osd with valgrind

Added by Loïc Dachary about 9 years ago. Updated about 9 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/sage-2015-01-06_09:44:19-rados-wip-sage-testing-firefly---basic-multi/688210/

It looks like the process is kill 9 while trying to revive the OSD: see

2015-01-07T18:37:25.003 INFO:tasks.ceph.osd.1.plana14.stderr:daemon-helper: command crashed with signal 9

in the log below:
2015-01-07T18:35:06.072 INFO:tasks.thrashosds.thrasher:Reviving osd 1
2015-01-07T18:35:06.072 INFO:tasks.ceph.osd.1:Restarting daemon
2015-01-07T18:35:06.073 INFO:teuthology.orchestra.run.plana14:Running: 'cd /home/ubuntu/cephtest && sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper term valgrind --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.1.log --time-stamp=yes --tool=memcheck ceph-osd -f -i 1'
2015-01-07T18:35:06.076 INFO:tasks.ceph.osd.1:Started
2015-01-07T18:35:06.076 INFO:teuthology.orchestra.run.plana14:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight'
2015-01-07T18:35:06.191 INFO:teuthology.orchestra.run.plana14.stderr:admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
2015-01-07T18:35:06.201 INFO:tasks.thrashosds.ceph_manager:waiting on admin_socket for 1, ['dump_ops_in_flight']
2015-01-07T18:35:08.972 INFO:tasks.ceph.osd.1.plana14.stdout:starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
2015-01-07T18:35:09.815 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:03.675 16256-- WARNING: unhandled syscall: 306
2015-01-07T18:35:09.816 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:03.675 16256-- You may be able to write your own handler.
2015-01-07T18:35:09.816 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:03.675 16256-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
2015-01-07T18:35:09.816 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:03.675 16256-- Nevertheless we consider this a bug.  Please report
2015-01-07T18:35:09.816 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:03.675 16256-- it at http://valgrind.org/support/bug_reports.html.
2015-01-07T18:35:11.202 INFO:teuthology.orchestra.run.plana14:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight'
2015-01-07T18:35:11.858 INFO:teuthology.orchestra.run.plana14.stderr:no valid command found; 10 closest matches:
2015-01-07T18:35:11.859 INFO:teuthology.orchestra.run.plana14.stderr:config show
2015-01-07T18:35:11.859 INFO:teuthology.orchestra.run.plana14.stderr:help
2015-01-07T18:35:11.859 INFO:teuthology.orchestra.run.plana14.stderr:log dump
2015-01-07T18:35:11.860 INFO:teuthology.orchestra.run.plana14.stderr:get_command_descriptions
2015-01-07T18:35:11.860 INFO:teuthology.orchestra.run.plana14.stderr:git_version
2015-01-07T18:35:11.860 INFO:teuthology.orchestra.run.plana14.stderr:config set <var> <val> [<val>...]
2015-01-07T18:35:11.860 INFO:teuthology.orchestra.run.plana14.stderr:version
2015-01-07T18:35:11.861 INFO:teuthology.orchestra.run.plana14.stderr:2
2015-01-07T18:35:11.861 INFO:teuthology.orchestra.run.plana14.stderr:config get <var>
2015-01-07T18:35:11.861 INFO:teuthology.orchestra.run.plana14.stderr:0
2015-01-07T18:35:11.862 INFO:teuthology.orchestra.run.plana14.stderr:admin_socket: invalid command
2015-01-07T18:35:11.866 INFO:tasks.thrashosds.ceph_manager:waiting on admin_socket for 1, ['dump_ops_in_flight']
2015-01-07T18:35:11.977 INFO:tasks.ceph.osd.1.plana14.stderr:2015-01-07 18:35:11.897432 403a840 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2015-01-07T18:35:13.043 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:06.902 16256-- WARNING: unhandled syscall: 306
2015-01-07T18:35:13.043 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:06.902 16256-- You may be able to write your own handler.
2015-01-07T18:35:13.044 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:06.902 16256-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
2015-01-07T18:35:13.044 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:06.902 16256-- Nevertheless we consider this a bug.  Please report
2015-01-07T18:35:13.044 INFO:tasks.ceph.osd.1.plana14.stderr:--00:00:00:06.902 16256-- it at http://valgrind.org/support/bug_reports.html.
2015-01-07T18:35:16.866 INFO:teuthology.orchestra.run.plana14:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight'
2015-01-07T18:35:17.266 INFO:teuthology.orchestra.run.plana14.stderr:no valid command found; 10 closest matches:
2015-01-07T18:35:17.266 INFO:teuthology.orchestra.run.plana14.stderr:config show
2015-01-07T18:35:17.267 INFO:teuthology.orchestra.run.plana14.stderr:help
2015-01-07T18:35:17.267 INFO:teuthology.orchestra.run.plana14.stderr:log dump
2015-01-07T18:35:17.267 INFO:teuthology.orchestra.run.plana14.stderr:get_command_descriptions
2015-01-07T18:35:17.267 INFO:teuthology.orchestra.run.plana14.stderr:git_version
2015-01-07T18:35:17.267 INFO:teuthology.orchestra.run.plana14.stderr:config set <var> <val> [<val>...]
2015-01-07T18:35:17.267 INFO:teuthology.orchestra.run.plana14.stderr:version
2015-01-07T18:35:17.268 INFO:teuthology.orchestra.run.plana14.stderr:2
2015-01-07T18:35:17.268 INFO:teuthology.orchestra.run.plana14.stderr:config get <var>
2015-01-07T18:35:17.268 INFO:teuthology.orchestra.run.plana14.stderr:0
2015-01-07T18:35:17.268 INFO:teuthology.orchestra.run.plana14.stderr:admin_socket: invalid command
2015-01-07T18:35:17.273 INFO:tasks.thrashosds.ceph_manager:waiting on admin_socket for 1, ['dump_ops_in_flight']
2015-01-07T18:35:18.641 INFO:tasks.workunit.client.0.plana14.stdout:promoting some heads
2015-01-07T18:35:22.274 INFO:teuthology.orchestra.run.plana14:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight'
2015-01-07T18:35:27.778 INFO:tasks.thrashosds.thrasher:in_osds:  [1, 5, 3, 0, 2, 4]  out_osds:  [] dead_osds:  [] live_osds:  [2, 5, 4, 3, 0, 1]
2015-01-07T18:35:27.779 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0
2015-01-07T18:35:27.779 INFO:tasks.thrashosds.thrasher:inject_pause on 5
2015-01-07T18:35:27.779 INFO:tasks.thrashosds.thrasher:Testing filestore_inject_stall pause injection for duration 3
2015-01-07T18:35:27.779 INFO:tasks.thrashosds.thrasher:Checking after 0, should_be_down=False
2015-01-07T18:35:27.779 INFO:teuthology.orchestra.run.plana27:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.5.asok config set filestore_inject_stall 3'
2015-01-07T18:35:32.923 INFO:tasks.thrashosds.thrasher:in_osds:  [1, 5, 3, 0, 2, 4]  out_osds:  [] dead_osds:  [] live_osds:  [2, 5, 4, 3, 0, 1]
2015-01-07T18:35:32.923 INFO:tasks.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0
2015-01-07T18:35:32.923 INFO:tasks.thrashosds.thrasher:Killing osd 1, live_osds are [2, 5, 4, 3, 0, 1]
2015-01-07T18:35:47.788 INFO:tasks.workunit.client.0.plana14.stdout:promoting from clones for snap 6
2015-01-07T18:35:49.474 INFO:tasks.workunit.client.0.plana14.stdout:promoting from clones for snap 5
2015-01-07T18:36:10.981 INFO:tasks.workunit.client.0.plana14.stdout:promoting from clones for snap 4
2015-01-07T18:36:24.823 INFO:tasks.workunit.client.0.plana14.stdout:promoting from clones for snap 3
2015-01-07T18:36:40.278 INFO:tasks.workunit.client.0.plana14.stdout:waiting for scrubs...
2015-01-07T18:37:10.278 INFO:tasks.workunit.client.0.plana14.stdout:done waiting
2015-01-07T18:37:14.789 INFO:tasks.workunit.client.0.plana14.stdout:[       OK ] LibRadosTwoPoolsPP.PromoteSnapScrub (286847 ms)
2015-01-07T18:37:14.790 INFO:tasks.workunit.client.0.plana14.stdout:[ RUN      ] LibRadosTwoPoolsPP.PromoteSnapTrimRace
2015-01-07T18:37:25.003 INFO:tasks.ceph.osd.1.plana14.stderr:daemon-helper: command crashed with signal 9
2015-01-07T18:37:29.354 INFO:tasks.workunit.client.0.plana14.stdout:[       OK ] LibRadosTwoPoolsPP.PromoteSnapTrimRace (14564 ms)
...
2015-01-07T18:53:06.151 INFO:tasks.workunit.client.0.plana14.stderr:+ exit 0
2015-01-07T18:53:06.151 INFO:teuthology.orchestra.run.plana14:Running: 'sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0/tmp'
2015-01-07T18:53:06.177 INFO:tasks.workunit:Stopping rados/test.sh on client.0...
2015-01-07T18:53:06.178 INFO:teuthology.orchestra.run.plana14:Running: 'rm -rf -- /home/ubuntu/cephtest/workunits.list /home/ubuntu/cephtest/workunit.client.0'
2015-01-07T18:53:06.252 INFO:teuthology.orchestra.run.plana14:Running: 'rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0'
2015-01-07T18:53:06.322 INFO:tasks.workunit:Deleted dir /home/ubuntu/cephtest/mnt.0/client.0
2015-01-07T18:53:06.323 INFO:teuthology.orchestra.run.plana14:Running: 'rmdir -- /home/ubuntu/cephtest/mnt.0'
2015-01-07T18:53:06.421 INFO:tasks.workunit:Deleted dir /home/ubuntu/cephtest/mnt.0
2015-01-07T18:53:06.422 INFO:tasks.thrashosds:joining thrashosds
2015-01-07T18:53:06.422 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 119, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/thrashosds.py", line 180, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/src/ceph-qa-suite_firefly/tasks/ceph_manager.py", line 165, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
CommandFailedError: Command failed on plana14 with status 1: 'cd /home/ubuntu/cephtest && sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper term valgrind --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.1.log --time-stamp=yes --tool=memcheck ceph-osd -f -i 1'

History

#1 Updated by Loïc Dachary about 9 years ago

  • Status changed from New to Won't Fix

if its kill -9 it sounds like daemon-helper, which would be triggered by a network blip killing the connections from tetuhology to the daemon. i.e. infrastructur enoise

Also available in: Atom PDF