Feature #82
mon: osd failure smarts
% Done:
0%
Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:
Description
We shouldn't necessarily believe every osd failure report we get, especially when there are lots of other nodes monitoring the osd that haven't reported the problem.
History
#1 Updated by Sage Weil over 13 years ago
- Target version set to 12
#2 Updated by Sage Weil over 13 years ago
- Target version changed from 12 to v0.22
#3 Updated by Sage Weil over 13 years ago
- Priority changed from Low to Normal
A simple approach would be to index the osd peers/heartbeat graph in the PGMonitor and only mark an osd down when some set fraction of peers declare the osd down.
#4 Updated by Sage Weil over 13 years ago
- Assignee set to Greg Farnum
#5 Updated by Greg Farnum over 13 years ago
- Status changed from New to Resolved
Pushed in 77ee6dc1cc8e34d0d0be02c90c976058603f78b2.
The OSDMonitor will only mark an OSD down after it gets a minimum number of reports (default 3) from a minimum number of OSDs (default 1). This should prevent a laggy OSD from marking all its peers down.