Actions
Feature #65637
openmds: continue sending heartbeats during recovery when MDS journal is large
% Done:
0%
Source:
Development
Tags:
Backport:
squid,reef
Reviewed:
Affected Versions:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Description
When the MDS reaches up:rejoin / up:resolve after spending a long time (hours) in up:replay, it often gets in an loop somewhere with the mds_lock. This causes it to miss heartbeat resets. Consequently, the beacon thread will stop sending beacons to the monitors.
Make the MDS smarter by:
- If replay took X time, lengthen the internal heartbeat grace period by some configurable factor during up:resolve/up:rejoin.
- Note in beacons a new health warning about long recovery during these states.
Updated by Patrick Donnelly 11 days ago
- Related to Feature #61863: mds: issue a health warning with estimated time to complete replay added
Updated by Patrick Donnelly 10 days ago
- Related to Bug #65658: mds: MetricAggregator::ms_can_fast_dispatch2 acquires locks added
Actions