Actions
Bug #6118
closedfailed to recover before timeout expired on radosbench, rados api tests
Added by Sage Weil over 10 years ago. Updated over 10 years ago.
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
ubuntu@teuthology:/a/teuthology-2013-08-25_09:23:30-rados-master-testing-basic-plana/4753
Updated by Sage Weil over 10 years ago
- Subject changed from failed to recover before timeout expired on radosbench to failed to recover before timeout expired on radosbench, rados api tests
4 objects degraded, 1 pg stuck in recovery_wait
{u'election_epoch': 6, u'quorum': [0, 1, 2], u'mdsmap': {u'max': 1, u'epoch': 5, u'by_rank': [{u'status': u'up:active', u'name': u'a', u'rank': 0}], u'up': 1, u'in': 1}, u'monmap': {u'epoch': 1, u'mons': [{u'name': u'b', u'rank': 0, u'addr': u'10. 214.131.10:6789/0'}, {u'name': u'a', u'rank': 1, u'addr': u'10.214.132.34:6789/0'}, {u'name': u'c', u'rank': 2, u'addr': u'10.214.132.34:6790/0'}], u'modified': u'2013-09-03 02:37:54.713075', u'fsid': u'1934cbfb-2bc2-4a63-a87e-edf7f443e025', u'created': u'2013-09-03 02:37:54.713075'}, u'health': {u'detail': [], u't imechecks': {u'round_status': u'finished', u'epoch': 6, u'round': 16, u'mons': [{u'latency': u'0.000000', u'skew': u'0.000000', u'health': u'HEALTH_OK', u'name': u'b'}, {u'latency': u'0.045938', u'skew': u'0.000000', u'health': u'HEALTH_OK', u'name': u'a'}, {u'latency': u'0.125255', u'skew': u'0.000000', u'health': u'HEALTH_OK', u'name': u'c'}]}, u'health': {u'health_services': [{u'mons': [{u'last_updated': u'2013-09-03 03:14:09.386691', u'name': u'b', u'avail_percent': 91, u'kb_total': 472345880, u'kb_avail': 430895876, u'health': u'HEALTH_OK', u'kb_used': 17433108}, {u'last_updated': u'2013-09-03 03:14:10.489290', u'name': u'a', u'avail_percent': 92, u'kb_total': 472345880, u'kb_avail': 437662924, u'health': u'HEALTH_OK', u'kb_used': 10666060}, {u'last_updated': u'2013-09-03 03:14:09.490316', u'name': u'c', u'avail_percent': 92, u'kb_total': 472345880, u'kb_avail': 437662924, u'health': u'HEALTH_OK', u'kb_used': 10666060}]}]}, u'ove rall_status': u'HEALTH_WARN', u'summary': [{u'severity': u'HEALTH_WARN', u'summary': u'1 pgs recovery_wait'}]}, u'pgmap': {u'bytes_total': 3000647172096, u'degraded_objects': 4, u'num_pgs': 212, u'data_bytes': 43201, u'degraded_total': 402, u'bytes_used': 684716032, u'version': 755, u'pgs_by_state': [{u'count': 211 , u'state_name': u'active+clean'}, {u'count': 1, u'state_name': u'active+recovery_wait'}], u'degrated_ratio': u'0.995', u'bytes_avail': 2993456123904}, u'quorum_names': [u'b', u'a', u'c'], u'osdmap': {u'osdmap': {u'full': u'false', u'nearfull': u'false', u'num_osds': 6, u'num_up_osds': 6, u'epoch': 523, u'num_in_os ds': u'6'}}, u'fsid': u'1934cbfb-2bc2-4a63-a87e-edf7f443e025'}
ubuntu@teuthology:/a/teuthology-2013-09-02_20:00:14-rados-dumpling-testing-basic-plana/18001$ cat orig.config.yaml kernel: kdb: true sha1: 263cbbcaf605e359a46e30889595d82629f82080 machine_type: plana nuke-on-error: true os_type: ubuntu overrides: admin_socket: branch: dumpling ceph: conf: global: ms inject socket failures: 5000 mon: debug mon: 20 debug ms: 1 debug paxos: 20 osd: osd op thread timeout: 60 fs: btrfs log-whitelist: - slow request sha1: a708c8ab52e5b1476405a1f817c23b8845fbaab3 valgrind: mds: - --tool=memcheck mon: - --tool=memcheck - --leak-check=full - --show-reachable=yes osd: - --tool=memcheck ceph-deploy: branch: dev: dumpling conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 install: ceph: flavor: notcmalloc sha1: a708c8ab52e5b1476405a1f817c23b8845fbaab3 s3tests: branch: master workunit: sha1: a708c8ab52e5b1476405a1f817c23b8845fbaab3 roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 timeout: 1200 - workunit: clients: client.0: - rados/test.sh teuthology_branch: dumpling
Updated by Sage Weil over 10 years ago
another one with full logs: ubuntu@teuthology:/a/teuthology-2013-09-07_13:39:47-rados-dumpling-testing-basic-plana/25183
Updated by Samuel Just over 10 years ago
Seems actually to have been a hung ceph status. ceph.log seems to indicate that the pgs went clean.
Updated by Samuel Just over 10 years ago
Much of the code has been replaced as part of 5992, might be worth closing for now.
Updated by Samuel Just over 10 years ago
- Status changed from New to Can't reproduce
Actions