Project

General

Profile

Feature #82

mon: osd failure smarts

Added by Sage Weil almost 14 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Monitor
Target version:
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

We shouldn't necessarily believe every osd failure report we get, especially when there are lots of other nodes monitoring the osd that haven't reported the problem.

History

#1 Updated by Sage Weil over 13 years ago

  • Target version set to 12

#2 Updated by Sage Weil over 13 years ago

  • Target version changed from 12 to v0.22

#3 Updated by Sage Weil over 13 years ago

  • Priority changed from Low to Normal

A simple approach would be to index the osd peers/heartbeat graph in the PGMonitor and only mark an osd down when some set fraction of peers declare the osd down.

#4 Updated by Sage Weil over 13 years ago

  • Assignee set to Greg Farnum

#5 Updated by Greg Farnum over 13 years ago

  • Status changed from New to Resolved

Pushed in 77ee6dc1cc8e34d0d0be02c90c976058603f78b2.
The OSDMonitor will only mark an OSD down after it gets a minimum number of reports (default 3) from a minimum number of OSDs (default 1). This should prevent a laggy OSD from marking all its peers down.

Also available in: Atom PDF