Project

General

Profile

Bug #5431

osd: dump_stuck test fails with ENXIO

Added by Sage Weil over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
Start date:
06/23/2013
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

2013-06-23T01:19:57.365 DEBUG:teuthology.misc:with jobid basedir: 43201
2013-06-23T01:19:57.365 DEBUG:teuthology.orchestra.run:Running [10.214.131.24]: '/home/ubuntu/cephtest/43201/enable-coredump ceph-coverage /home/ubuntu/cephtest/43201/archive/coverage ceph osd out 0'
2013-06-23T01:19:57.698 INFO:teuthology.orchestra.run.err:marked out osd.0.
2013-06-23T01:20:57.707 DEBUG:teuthology.misc:with jobid basedir: 43201
2013-06-23T01:20:57.708 DEBUG:teuthology.orchestra.run:Running [10.214.131.24]: '/home/ubuntu/cephtest/43201/enable-coredump ceph-coverage /home/ubuntu/cephtest/43201/archive/coverage ceph tell osd.1 flush_pg_stats'
2013-06-23T01:21:27.120 INFO:teuthology.task.ceph.mon.a.err:2013-06-23 01:21:29.688964 7f685172f700 -1 mon.a@0(leader).osd e7 no osd or pg stats from osd.0 since 2013-06-23 01:19:56.134305, 93.554543 seconds ago.  marking down
2013-06-23T01:21:32.302 INFO:teuthology.task.ceph.mon.a.err:2013-06-23 01:21:34.871213 7f685172f700 -1 mon.a@0(leader).osd e10 no osd or pg stats from osd.1 since 2013-06-23 01:20:03.885098, 90.986060 seconds ago.  marking down
2013-06-23T01:21:32.840 INFO:teuthology.orchestra.run.err:Error ENXIO: osd down
2013-06-23T01:21:32.851 ERROR:teuthology.run_tasks:Saw exception from tasks
Traceback (most recent call last):
  File "/home/teuthworker/teuthology-master/teuthology/run_tasks.py", line 25, in run_tasks

job is
ubuntu@teuthology:/a/teuthology-2013-06-23_01:00:12-rados-master-testing-basic/43201$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 2dd322b42d608a37f3e5beed57a8fbc673da6e32
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: master
  ceph:
    conf:
      global:
        ms inject socket failures: 500
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: b89d7420e3501247d6ed282d2253c95c758526b1
  install:
    ceph:
      sha1: b89d7420e3501247d6ed282d2253c95c758526b1
  s3tests:
    branch: master
  workunit:
    sha1: b89d7420e3501247d6ed282d2253c95c758526b1
roles:
- - mon.a
  - mds.0
  - osd.0
  - osd.1
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    conf:
      mon:
        mon_osd_report_timeout: 90
        mon_pg_stuck_threshold: 10
    log-whitelist:
    - wrongly marked me down
- dump_stuck: null

Associated revisions

Revision e6e1df69 (diff)
Added by Sage Weil over 4 years ago

dump_stuck: fix race with osd start

Occasionally we don't wait long enough for the osd to start and
mark itself up. Keep trying until flush succeeds.

Fixes: #5431
Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil over 4 years ago

  • Status changed from New to Need Review

#2 Updated by Sage Weil over 4 years ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF