Bug #21121: test_health_warnings.sh can fail - RADOS - Ceph

Actions

Copy link

Bug #21121

closed

test_health_warnings.sh can fail

Added by Sage Weil over 6 years ago. Updated about 6 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Sage Weil

Category:

Target version:

% Done:

Source:

Tags:

Backport:

luminous,jewel

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

- test_mark_all_but_last_osds_down marks all but one osd down
- clears noup
- osd.1 fails the is_healthy check because it is failing to respond on its old address
- meanwhile, all osds are back up.
- eventually mon marks osd.1 out
- test fails...

/a/sage-2017-08-24_17:38:40-rados-wip-sage-testing2-luminous-20170824a-distro-basic-smithi/1560394

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Sage Weil over 6 years ago

I believe the fix is to subscribe to osdmaps when in the waiting for healthy state. if we are unhealthy because we are failing to ping our "up" peers, we need to be sure that the cluster actually things they're up and we're not just stuck on an old map.

Actions

Copy link