Actions
Bug #65658
openmds: MetricAggregator::ms_can_fast_dispatch2 acquires locks
Status:
Fix Under Review
Priority:
High
Assignee:
Category:
Performance/Resource Usage
Target version:
% Done:
0%
Source:
Development
Tags:
Backport:
squid,reef,quincy
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
There was a lot of discussion surrounding this in
https://github.com/ceph/ceph/pull/26004/
but circling back we have since seen evidence this is causing significant problems: after a long up:replay recovery, the MDS can be flooded with metrics messages by clients and the lock contention in fast_dispatch is preventing the MDS from sending beacons to the monitors. This then leads to undesirable MDS failovers.
We should convert this to using regular dispatch (and optimize later if needed).
Updated by Patrick Donnelly 10 days ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 57081
Updated by Patrick Donnelly 10 days ago
- Related to Feature #65637: mds: continue sending heartbeats during recovery when MDS journal is large added
Actions