Project

General

Profile

Actions

Bug #21121

closed

test_health_warnings.sh can fail

Added by Sage Weil over 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

- test_mark_all_but_last_osds_down marks all but one osd down
- clears noup
- osd.1 fails the is_healthy check because it is failing to respond on its old address
- meanwhile, all osds are back up.
- eventually mon marks osd.1 out
- test fails...

/a/sage-2017-08-24_17:38:40-rados-wip-sage-testing2-luminous-20170824a-distro-basic-smithi/1560394


Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #21238: luminous: test_health_warnings.sh can failResolvedNathan CutlerActions
Copied to RADOS - Backport #21239: jewel: test_health_warnings.sh can failResolvedNathan CutlerActions
Actions #1

Updated by Sage Weil over 6 years ago

I believe the fix is to subscribe to osdmaps when in the waiting for healthy state. if we are unhealthy because we are failing to ping our "up" peers, we need to be sure that the cluster actually things they're up and we're not just stuck on an old map.

Actions #2

Updated by Sage Weil over 6 years ago

  • Status changed from 12 to Fix Under Review
  • Backport set to luminous,jewel
Actions #3

Updated by Sage Weil over 6 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #21238: luminous: test_health_warnings.sh can fail added
Actions #5

Updated by Nathan Cutler over 6 years ago

Actions #6

Updated by Nathan Cutler about 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF