Bug #20693: monthrash has spurious PG_AVAILABILITY etc warnings - RADOS - Ceph

Actions

Copy link

Bug #20693

closed

monthrash has spurious PG_AVAILABILITY etc warnings

Added by Sage Weil almost 7 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

/a/sage-2017-07-19_15:27:16-rados-wip-sage-testing2-distro-basic-smithi/1419393

no osd thrashing, but not fully peered and healthy before mons start thrashing, which makes peering slow enough to trigger a health warning. saw this on another run today too.

can probably fix by making the wait_for_healthy also wait for unknown pgs? and/or also flush the pg stats?

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by Nathan Cutler over 6 years ago

Related to Bug #20690: Cluster status is HEALTH_OK even though PGs are in unknown state added

Actions

Copy link

Updated by Sage Weil over 6 years ago

Ok, I've addressed one soruce of this, but there is another, see

/a/sage-2017-07-24_03:44:49-rados-wip-sage-testing-distro-basic-smithi/1437231

The problem is that with at-end.yaml we set require-osd-release luminous, which with the peering deletes triggers a new peering interval and the PG_AVAILABILITY+PG_DEGRADED warning.

Actions

Copy link