Project

General

Profile

Bug #20693

monthrash has spurious PG_AVAILABILITY etc warnings

Added by Sage Weil over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/sage-2017-07-19_15:27:16-rados-wip-sage-testing2-distro-basic-smithi/1419393

no osd thrashing, but not fully peered and healthy before mons start thrashing, which makes peering slow enough to trigger a health warning. saw this on another run today too.

can probably fix by making the wait_for_healthy also wait for unknown pgs? and/or also flush the pg stats?


Related issues

Related to RADOS - Bug #20690: Cluster status is HEALTH_OK even though PGs are in unknown state Need More Info 07/19/2017

History

#1 Updated by Nathan Cutler over 6 years ago

  • Related to Bug #20690: Cluster status is HEALTH_OK even though PGs are in unknown state added

#2 Updated by Sage Weil over 6 years ago

Ok, I've addressed one soruce of this, but there is another, see

/a/sage-2017-07-24_03:44:49-rados-wip-sage-testing-distro-basic-smithi/1437231

The problem is that with at-end.yaml we set require-osd-release luminous, which with the peering deletes triggers a new peering interval and the PG_AVAILABILITY+PG_DEGRADED warning.

#3 Updated by Sage Weil over 6 years ago

  • Status changed from 12 to Fix Under Review

#4 Updated by Sage Weil over 6 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF