Bug #46179: Health check failed: Reduced data availability: PG_AVAILABILITY - RADOS - Ceph

Actions

Copy link

Bug #46179

closed

Health check failed: Reduced data availability: PG_AVAILABILITY

Added by Casey Bodley almost 4 years ago. Updated almost 4 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

multiple RGW tests are failing on different branches, with:

failure_reason: '"2020-05-19T22:16:08.390058+0000 mon.b (mon.0) 275 : cluster [WRN]
  Health check failed: Reduced data availability: 1 pg inactive, 1 pg peering (PG_AVAILABILITY)" 
  in cluster log'

see: http://pulpito.ceph.com/yuvalif-2020-06-23_14:40:15-rgw-wip-yuval-test-35331-35155-distro-basic-smithi/

failures in the rgw/website and rgw/multifs suites that weren't whitelisted in https://github.com/ceph/ceph/pull/35302 and https://github.com/ceph/ceph/pull/35351

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Casey Bodley almost 4 years ago

Copied from Bug #45802: Health check failed: Reduced data availability: PG_AVAILABILITY added

Actions

Copy link

Updated by Sridhar Seshasayee almost 4 years ago

/a/sseshasa-2020-06-24_17:46:09-rados-wip-sseshasa-testing-2020-06-24-1858-distro-basic-smithi/
job ID: 5176200

Failure Reason:
"2020-06-24T19:37:55.781828+0000 mon.b (mon.0) 109 : cluster [WRN] Health check failed: Reduced data availability: 1 pg inactive, 1 pg peering (PG_AVAILABILITY)" in cluster log

Actions

Copy link

Updated by Neha Ojha almost 4 years ago

This failure is different from the one seen in the RGW suite earlier due to upmap. This is related to https://tracker.ceph.com/issues/46180.

/a/sseshasa-2020-06-24_17:46:09-rados-wip-sseshasa-testing-2020-06-24-1858-distro-basic-smithi/5176200

2020-06-24T19:57:03.597 INFO:tasks.ceph.ceph_manager.ceph:PG 2.4 is not active+clean
2020-06-24T19:57:03.598 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '2.4', 'version': "0'0", 'reported_seq': '2', 'reported_epoch': '18', 'state': 'creating+peering', 'last_fresh': '2020-06-24T19:36:55.841802+0000', 'last_change': '2020-06-24T19:36:54.838308+0000', 'last_active': '2020-06-24T19:36:54.826797+0000', 'last_peered': '2020-06-24T19:36:54.826797+0000', 'last_clean': '2020-06-24T19:36:54.826797+0000', 'last_became_active': '0.000000', 'last_became_peered': '0.000000', 'last_unstale': '2020-06-24T19:36:55.841802+0000', 'last_undegraded': '2020-06-24T19:36:55.841802+0000', 'last_fullsized': '2020-06-24T19:36:55.841802+0000', 'mapping_epoch': 17, 'log_start': "0'0", 'ondisk_log_start': "0'0", 'created': 17, 'last_epoch_clean': 0, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "0'0", 'last_scrub_stamp': '2020-06-24T19:36:54.826797+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2020-06-24T19:36:54.826797+0000', 'last_clean_scrub_stamp': '2020-06-24T19:36:54.826797+0000', 'log_size': 0, 'ondisk_log_size': 0, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'stat_sum': {'num_bytes': 0, 'num_objects': 0, 'num_object_clones': 0, 'num_object_copies': 0, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 0, 'num_whiteouts': 0, 'num_read': 0, 'num_read_kb': 0, 'num_write': 0, 'num_write_kb': 0, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 0, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [5, 6], 'acting': [5, 6], 'avail_no_missing': [], 'object_location_counts': [], 'blocked_by': [6], 'up_primary': 5, 'acting_primary': 5, 'purged_snaps': []}

2020-06-24T20:10:26.334 INFO:tasks.ceph:Waiting for all PGs to be active+clean and split+merged, waiting on ['2.4'] to go clean and/or [] to split/merge
2020-06-24T20:10:46.335 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sseshasa-testing-2020-06-24-1858/qa/tasks/ceph.py", line 1829, in task
    healthy(ctx=ctx, config=dict(cluster=config['cluster']))
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sseshasa-testing-2020-06-24-1858/qa/tasks/ceph.py", line 1419, in healthy
    manager.wait_for_clean()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sseshasa-testing-2020-06-24-1858/qa/tasks/ceph_manager.py", line 2516, in wait_for_clean
    'wait_for_clean: failed before timeout expired'
AssertionError: wait_for_clean: failed before timeout expired

/a/yuvalif-2020-06-23_14:40:15-rgw-wip-yuval-test-35331-35155-distro-basic-smithi/5173512

2020-06-24T06:59:39.550 INFO:tasks.ceph.ceph_manager.ceph:PG 2.3 is not active+clean
2020-06-24T06:59:39.550 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '2.3', 'version': "0'0", 'reported_seq': '2', 'reported_epoch': '17', 'state': 'creating+peering', 'last_fresh': '2020-06-24T06:39:28.733784+0000', 'last_change': '2020-06-24T06:39:27.767047+0000', 'last_active': '2020-06-24T06:39:27.707409+0000', 'last_peered': '2020-06-24T06:39:27.707409+0000', 'last_clean': '2020-06-24T06:39:27.707409+0000', 'last_became_active': '0.000000', 'last_became_peered': '0.000000', 'last_unstale': '2020-06-24T06:39:28.733784+0000', 'last_undegraded': '2020-06-24T06:39:28.733784+0000', 'last_fullsized': '2020-06-24T06:39:28.733784+0000', 'mapping_epoch': 16, 'log_start': "0'0", 'ondisk_log_start': "0'0", 'created': 16, 'last_epoch_clean': 0, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "0'0", 'last_scrub_stamp': '2020-06-24T06:39:27.707409+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2020-06-24T06:39:27.707409+0000', 'last_clean_scrub_stamp': '2020-06-24T06:39:27.707409+0000', 'log_size': 0, 'ondisk_log_size': 0, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'stat_sum': {'num_bytes': 0, 'num_objects': 0, 'num_object_clones': 0, 'num_object_copies': 0, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 0, 'num_whiteouts': 0, 'num_read': 0, 'num_read_kb': 0, 'num_write': 0, 'num_write_kb': 0, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 0, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [5, 2], 'acting': [5, 2], 'avail_no_missing': [], 'object_location_counts': [], 'blocked_by': [2], 'up_primary': 5, 'acting_primary': 5, 'purged_snaps': []}

2020-06-24T06:59:40.137 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 33, in nested
    yield vars
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-yuval-test-35331-35155/qa/tasks/ceph.py", line 1830, in task
    healthy(ctx=ctx, config=dict(cluster=config['cluster']))
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-yuval-test-35331-35155/qa/tasks/ceph.py", line 1419, in healthy
    manager.wait_for_clean()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-yuval-test-35331-35155/qa/tasks/ceph_manager.py", line 2518, in wait_for_clean
    'wait_for_clean: failed before timeout expired'

We should not whitelist PG_AVAILABILITY for the other RGW suites until https://tracker.ceph.com/issues/46180 is fixed.

Actions

Copy link