Project

General

Profile

Actions

Feature #22260

open

osd: recover after network outages

Added by Anonymous over 6 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Component(RADOS):
OSD
Pull request ID:

Description

We've run into a situation where after an 802.3ad/lacp enabled switch(es) has been rebooted, some OSDs failed to recover connections to their peers, flooding the logs with "heartbeat_check: no reply..." messages. I will update the description with more details, but the ultimate intent of this ticket is to look at ways we might recover OSD connectivity after the myriad of potential network failures possible in a running cluster.

Actions

Also available in: Atom PDF