Project

General

Profile

Actions

Bug #4254

closed

osd: failure to recover before timeout on rados bench and thrashing; negative stats

Added by Sage Weil about 11 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

snippet:

2013-02-23T10:50:34.510 INFO:teuthology.task.thrashosds.ceph_manager:   health HEALTH_WARN 1 pgs backfilling; 1 pgs stuck unclean; recovery -706/10675 degraded (-6.614%);  recovering 0 o/s, 1947KB/s
   monmap e1: 3 mons at {a=10.214.132.28:6789/0,b=10.214.132.27:6789/0,c=10.214.132.28:6790/0}, election epoch 6, quorum 0,1,2 a,b,c
   osdmap e170: 6 osds: 6 up, 5 in
    pgmap v2442: 72 pgs: 71 active+clean, 1 active+remapped+backfilling; 20592 MB data, 43484 MB used, 2746 GB / 2794 GB avail; 0B/s wr, 4op/s; -706/10675 degraded (-6.614%);  recovering 0 o/s, 1947KB/s
   mdsmap e5: 1/1/1 up {0=a=up:active}

job was
ubuntu@teuthology:/a/sage-2013-02-23_08:44:35-regression-master-testing-basic/10262$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 92a49fb0f79f3300e6e50ddf56238e70678e4202
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 704db850131643b26bafe6594946cacce483c171
  s3tests:
    branch: master
  workunit:
    sha1: 704db850131643b26bafe6594946cacce483c171
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- radosbench:
    clients:
    - client.0
    time: 1800


Related issues 2 (0 open2 closed)

Related to Ceph - Bug #4109: incorrect degraded countDuplicateSamuel Just02/12/2013

Actions
Related to Ceph - Bug #3720: Ceph Reporting Negative Number of Degraded objectsDuplicateSamuel Just01/03/2013

Actions
Actions

Also available in: Atom PDF