Actions
Feature #7344
closedosd: add additional heartbeat on cluster interface
Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:
Description
A user had a switch configuration problem (no jumbo frames) that prevented progress on the cluster interface but allowed heartbeats to go through. The cluster was unaware that there was a networking issue, and all pgs got stuck in various stages of peering.
Add another layer of heartbeat on the cluster interface that has a higher timeout so that if things are stalled out we can detect it. Possibly indicate in the failure report what the nature of the failure is so that it is easier for an admin to resolve the problem.
Actions