Project

General

Profile

Bug #21121

test_health_warnings.sh can fail

Added by Sage Weil about 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

- test_mark_all_but_last_osds_down marks all but one osd down
- clears noup
- osd.1 fails the is_healthy check because it is failing to respond on its old address
- meanwhile, all osds are back up.
- eventually mon marks osd.1 out
- test fails...

/a/sage-2017-08-24_17:38:40-rados-wip-sage-testing2-luminous-20170824a-distro-basic-smithi/1560394


Related issues

Copied to RADOS - Backport #21238: luminous: test_health_warnings.sh can fail Resolved
Copied to RADOS - Backport #21239: jewel: test_health_warnings.sh can fail Resolved

History

#1 Updated by Sage Weil about 3 years ago

I believe the fix is to subscribe to osdmaps when in the waiting for healthy state. if we are unhealthy because we are failing to ping our "up" peers, we need to be sure that the cluster actually things they're up and we're not just stuck on an old map.

#2 Updated by Sage Weil about 3 years ago

  • Status changed from 12 to Fix Under Review
  • Backport set to luminous,jewel

#3 Updated by Sage Weil about 3 years ago

  • Status changed from Fix Under Review to Pending Backport

#4 Updated by Nathan Cutler about 3 years ago

  • Copied to Backport #21238: luminous: test_health_warnings.sh can fail added

#5 Updated by Nathan Cutler about 3 years ago

#6 Updated by Nathan Cutler over 2 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF