Actions
Feature #65637
openmds: continue sending heartbeats during recovery when MDS journal is large
Status:
New
Priority:
Urgent
Assignee:
Category:
Administration/Usability
Target version:
% Done:
0%
Source:
Development
Tags:
Backport:
squid,reef
Reviewed:
Affected Versions:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Description
When the MDS reaches up:rejoin / up:resolve after spending a long time (hours) in up:replay, it often gets in an loop somewhere with the mds_lock. This causes it to miss heartbeat resets. Consequently, the beacon thread will stop sending beacons to the monitors.
Make the MDS smarter by:
- If replay took X time, lengthen the internal heartbeat grace period by some configurable factor during up:resolve/up:rejoin.
- Note in beacons a new health warning about long recovery during these states.
Actions