Project

General

Profile

Actions

Bug #7449

closed

thrashosds can preclude HEALTH_OK?

Added by Sage Weil about 10 years ago. Updated about 10 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
-
Category:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

2014-02-16T12:05:26.385 INFO:teuthology.task.thrashosds.ceph_manager:{u'election_epoch': 6, u'quorum': [0, 1, 2], u'mdsmap': {u'max': 1, u'epoch': 5, u'by_rank': [{u'status': u'up:active', u'name': u'a', u'rank': 0}], u'up': 1, u'in': 1}, u'monmap': {u'epoch': 1, u'mons': [{u'name': u'a', u'rank': 0, u'addr': u'10.214.131.8:6789/0'}, {u'name': u'b', u'rank': 1, u'addr': u'10.214.131.17:6789/0'}, {u'name': u'c', u'rank': 2, u'addr': u'10.214.131.8:6790/0'}], u'modified': u'2014-02-16 11:33:31.197838', u'fsid': u'c1ccad71-2efc-4e28-9c36-d8bd0002eb57', u'created': u'2014-02-16 11:33:31.197838'}, u'health': {u'detail': [], u'timechecks': {u'round_status': u'finished', u'epoch': 6, u'round': 14, u'mons': [{u'latency': u'0.000000', u'skew': u'0.000000', u'health': u'HEALTH_OK', u'name': u'a'}, {u'latency': u'0.050112', u'skew': u'0.000000', u'health': u'HEALTH_OK', u'name': u'b'}, {u'latency': u'0.043691', u'skew': u'0.000000', u'health': u'HEALTH_OK', u'name': u'c'}]}, u'health': {u'health_services': [{u'mons': [{u'last_updated': u'2014-02-16 12:04:56.723256', u'name': u'a', u'avail_percent': 92, u'kb_total': 472345880, u'kb_avail': 436505984, u'health': u'HEALTH_OK', u'kb_used': 11823000, u'store_stats': {u'bytes_total': 5479950, u'bytes_log': 3080192, u'last_updated': u'0.000000', u'bytes_misc': 65552, u'bytes_sst': 2334206}}, {u'last_updated': u'2014-02-16 12:04:56.870676', u'name': u'b', u'avail_percent': 93, u'kb_total': 472345880, u'kb_avail': 440517428, u'health': u'HEALTH_OK', u'kb_used': 7811556, u'store_stats': {u'bytes_total': 7577227, u'bytes_log': 5177344, u'last_updated': u'0.000000', u'bytes_misc': 65552, u'bytes_sst': 2334331}}, {u'last_updated': u'2014-02-16 12:04:56.729056', u'name': u'c', u'avail_percent': 92, u'kb_total': 472345880, u'kb_avail': 436505976, u'health': u'HEALTH_OK', u'kb_used': 11823008, u'store_stats': {u'bytes_total': 7577183, u'bytes_log': 5177344, u'last_updated': u'0.000000', u'bytes_misc': 65552, u'bytes_sst': 2334287}}]}]}, u'overall_status': u'HEALTH_WARN', u'summary': [{u'severity': u'HEALTH_WARN', u'summary': u'1 pgs recovering'}, {u'severity': u'HEALTH_WARN', u'summary': u'1 pgs stuck unclean'}, {u'severity': u'HEALTH_WARN', u'summary': u'recovery 192/24140 objects degraded (0.795%)'}, {u'severity': u'HEALTH_WARN', u'summary': u'pool data pg_num 34 > pgp_num 24'}, {u'severity': u'HEALTH_WARN', u'summary': u'pool rbd pg_num 64 > pgp_num 24'}, {u'severity': u'HEALTH_WARN', u'summary': u'pool base has too few pgs'}, {u'severity': u'HEALTH_WARN', u'summary': u'pool cache has too few pgs'}]}, u'pgmap': {u'bytes_total': 1999454683136, u'degraded_objects': 192, u'num_pgs': 140, u'data_bytes': 24048019299, u'degraded_total': 24140, u'bytes_used': 42654490624, u'version': 563, u'pgs_by_state': [{u'count': 139, u'state_name': u'active+clean'}, {u'count': 1, u'state_name': u'active+recovering'}], u'degraded_ratio': u'0.795', u'bytes_avail': 1956800192512}, u'quorum_names': [u'a', u'b', u'c'], u'osdmap': {u'osdmap': {u'full': u'false', u'nearfull': u'false', u'num_osds': 6, u'num_up_osds': 6, u'epoch': 83, u'num_in_osds': 4}}, u'fsid': u'c1ccad71-2efc-4e28-9c36-d8bd0002eb57'}

and loops until it times out.

- pgp < pg
- pool cache has too few pgs

also looks like some pgs have degraded objects, so not all teuthology's fault?

ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-02-16_08:28:52-rados:thrash-wip-agent-testing-basic-plana/86400

Actions #1

Updated by Greg Farnum about 10 years ago

This might just be a result of the degraded objects -- I think every time the thrash tests make a change they wait for PGs to become active+clean before changing something again?

Actions #2

Updated by Sage Weil about 10 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF