Actions
Bug #8737
closedthrasher reviving osd racing with kill osd
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
It looks like starting and killing osd happen in the wrong order for some reason:
2014-07-02T20:51:03.189 INFO:teuthology.task.thrashosds.thrasher:in_osds: [0, 1, 2, 5, 3, 4] out_osds: [] dead_osds: [1] live_osds: [3, 5, 4, 0, 2] 2014-07-02T20:51:03.189 INFO:teuthology.task.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0 2014-07-02T20:51:03.190 INFO:teuthology.task.thrashosds.thrasher:Reviving osd 1 2014-07-02T20:51:03.190 INFO:teuthology.task.ceph.osd.1:Restarting daemon 2014-07-02T20:51:03.190 INFO:teuthology.orchestra.run.vpm114:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 1' 2014-07-02T20:51:03.193 INFO:teuthology.task.ceph.osd.1:Started 2014-07-02T20:51:03.193 INFO:teuthology.orchestra.run.vpm114:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight' 2014-07-02T20:51:03.367 INFO:teuthology.orchestra.run.vpm114.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused
this is from http://pulpito.ceph.com/loic-2014-07-02_23:05:05-upgrade:firefly-x:stress-split-firefly-testing-basic-vps/338908/
Updated by Sage Weil over 9 years ago
- Status changed from New to Rejected
i think you misread the log? nothing in that log snippet about killing the osd that i see?
Actions