Project

General

Profile

Actions

Feature #7344

closed

osd: add additional heartbeat on cluster interface

Added by Sage Weil about 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

A user had a switch configuration problem (no jumbo frames) that prevented progress on the cluster interface but allowed heartbeats to go through. The cluster was unaware that there was a networking issue, and all pgs got stuck in various stages of peering.

Add another layer of heartbeat on the cluster interface that has a higher timeout so that if things are stalled out we can detect it. Possibly indicate in the failure report what the nature of the failure is so that it is easier for an admin to resolve the problem.

Actions

Also available in: Atom PDF