Project

General

Profile

Bug #6118

failed to recover before timeout expired on radosbench, rados api tests

Added by Sage Weil over 10 years ago. Updated over 10 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ubuntu@teuthology:/a/teuthology-2013-08-25_09:23:30-rados-master-testing-basic-plana/4753

History

#1 Updated by Sage Weil over 10 years ago

  • Subject changed from failed to recover before timeout expired on radosbench to failed to recover before timeout expired on radosbench, rados api tests

4 objects degraded, 1 pg stuck in recovery_wait

{u'election_epoch': 6, u'quorum': [0, 1, 2], u'mdsmap': {u'max': 1, u'epoch': 5, u'by_rank': [{u'status': u'up:active', u'name': u'a', u'rank': 0}], u'up': 1, u'in': 1}, u'monmap': {u'epoch': 1, u'mons': [{u'name': u'b', u'rank': 0, u'addr': u'10.
214.131.10:6789/0'}, {u'name': u'a', u'rank': 1, u'addr': u'10.214.132.34:6789/0'}, {u'name': u'c', u'rank': 2, u'addr': u'10.214.132.34:6790/0'}], u'modified': u'2013-09-03 02:37:54.713075', u'fsid': u'1934cbfb-2bc2-4a63-a87e-edf7f443e025', u'created': u'2013-09-03 02:37:54.713075'}, u'health': {u'detail': [], u't
imechecks': {u'round_status': u'finished', u'epoch': 6, u'round': 16, u'mons': [{u'latency': u'0.000000', u'skew': u'0.000000', u'health': u'HEALTH_OK', u'name': u'b'}, {u'latency': u'0.045938', u'skew': u'0.000000', u'health': u'HEALTH_OK', u'name': u'a'}, {u'latency': u'0.125255', u'skew': u'0.000000', u'health':
 u'HEALTH_OK', u'name': u'c'}]}, u'health': {u'health_services': [{u'mons': [{u'last_updated': u'2013-09-03 03:14:09.386691', u'name': u'b', u'avail_percent': 91, u'kb_total': 472345880, u'kb_avail': 430895876, u'health': u'HEALTH_OK', u'kb_used': 17433108}, {u'last_updated': u'2013-09-03 03:14:10.489290', u'name':
 u'a', u'avail_percent': 92, u'kb_total': 472345880, u'kb_avail': 437662924, u'health': u'HEALTH_OK', u'kb_used': 10666060}, {u'last_updated': u'2013-09-03 03:14:09.490316', u'name': u'c', u'avail_percent': 92, u'kb_total': 472345880, u'kb_avail': 437662924, u'health': u'HEALTH_OK', u'kb_used': 10666060}]}]}, u'ove
rall_status': u'HEALTH_WARN', u'summary': [{u'severity': u'HEALTH_WARN', u'summary': u'1 pgs recovery_wait'}]}, u'pgmap': {u'bytes_total': 3000647172096, u'degraded_objects': 4, u'num_pgs': 212, u'data_bytes': 43201, u'degraded_total': 402, u'bytes_used': 684716032, u'version': 755, u'pgs_by_state': [{u'count': 211
, u'state_name': u'active+clean'}, {u'count': 1, u'state_name': u'active+recovery_wait'}], u'degrated_ratio': u'0.995', u'bytes_avail': 2993456123904}, u'quorum_names': [u'b', u'a', u'c'], u'osdmap': {u'osdmap': {u'full': u'false', u'nearfull': u'false', u'num_osds': 6, u'num_up_osds': 6, u'epoch': 523, u'num_in_os
ds': u'6'}}, u'fsid': u'1934cbfb-2bc2-4a63-a87e-edf7f443e025'}

ubuntu@teuthology:/a/teuthology-2013-09-02_20:00:14-rados-dumpling-testing-basic-plana/18001$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 263cbbcaf605e359a46e30889595d82629f82080
machine_type: plana
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: dumpling
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: a708c8ab52e5b1476405a1f817c23b8845fbaab3
    valgrind:
      mds:
      - --tool=memcheck
      mon:
      - --tool=memcheck
      - --leak-check=full
      - --show-reachable=yes
      osd:
      - --tool=memcheck
  ceph-deploy:
    branch:
      dev: dumpling
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
  install:
    ceph:
      flavor: notcmalloc
      sha1: a708c8ab52e5b1476405a1f817c23b8845fbaab3
  s3tests:
    branch: master
  workunit:
    sha1: a708c8ab52e5b1476405a1f817c23b8845fbaab3
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- workunit:
    clients:
      client.0:
      - rados/test.sh
teuthology_branch: dumpling

#3 Updated by Sage Weil over 10 years ago

another one with full logs: ubuntu@teuthology:/a/teuthology-2013-09-07_13:39:47-rados-dumpling-testing-basic-plana/25183

#4 Updated by Ian Colle over 10 years ago

  • Assignee set to Samuel Just

#5 Updated by Samuel Just over 10 years ago

http://qa-proxy.ceph.com/teuthology/teuthology-2013-09-04_20:00:07-rados-dumpling-testing-basic-plana/21637/

Seems actually to have been a hung ceph status. ceph.log seems to indicate that the pgs went clean.

#6 Updated by Samuel Just over 10 years ago

Much of the code has been replaced as part of 5992, might be worth closing for now.

#7 Updated by Samuel Just over 10 years ago

  • Status changed from New to Can't reproduce

Also available in: Atom PDF