Project

General

Profile

Actions

Bug #8737

closed

thrasher reviving osd racing with kill osd

Added by Loïc Dachary almost 10 years ago. Updated over 9 years ago.

Status:
Rejected
Priority:
Normal
Category:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

It looks like starting and killing osd happen in the wrong order for some reason:

2014-07-02T20:51:03.189 INFO:teuthology.task.thrashosds.thrasher:in_osds:  [0, 1, 2, 5, 3, 4]  out_osds:  [] dead_osds:  [1] live_osds:  [3, 5, 4, 0, 2]
2014-07-02T20:51:03.189 INFO:teuthology.task.thrashosds.thrasher:choose_action: min_in 3 min_out 0 min_live 2 min_dead 0
2014-07-02T20:51:03.190 INFO:teuthology.task.thrashosds.thrasher:Reviving osd 1
2014-07-02T20:51:03.190 INFO:teuthology.task.ceph.osd.1:Restarting daemon
2014-07-02T20:51:03.190 INFO:teuthology.orchestra.run.vpm114:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 1'
2014-07-02T20:51:03.193 INFO:teuthology.task.ceph.osd.1:Started
2014-07-02T20:51:03.193 INFO:teuthology.orchestra.run.vpm114:Running: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight'
2014-07-02T20:51:03.367 INFO:teuthology.orchestra.run.vpm114.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused

this is from http://pulpito.ceph.com/loic-2014-07-02_23:05:05-upgrade:firefly-x:stress-split-firefly-testing-basic-vps/338908/

Actions #1

Updated by Ian Colle over 9 years ago

  • Assignee set to Yuri Weinstein
Actions #2

Updated by Sage Weil over 9 years ago

  • Status changed from New to Rejected

i think you misread the log? nothing in that log snippet about killing the osd that i see?

Actions #3

Updated by Loïc Dachary over 9 years ago

I indeed misread these logs.

Actions

Also available in: Atom PDF