Bug #65658: mds: MetricAggregator::ms_can_fast_dispatch2 acquires locks - CephFS - Ceph

Actions

Copy link

Bug #65658

open

mds: MetricAggregator::ms_can_fast_dispatch2 acquires locks

Added by Patrick Donnelly 10 days ago. Updated 10 days ago.

Status:

Fix Under Review

Priority:

High

Assignee:

Patrick Donnelly

Category:

Performance/Resource Usage

Target version:

Ceph - v20.0.0

% Done:

Source:

Development

Tags:

Backport:

squid,reef,quincy

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

57081

Crash signature (v1):

Crash signature (v2):

Description

There was a lot of discussion surrounding this in

https://github.com/ceph/ceph/pull/26004/

but circling back we have since seen evidence this is causing significant problems: after a long up:replay recovery, the MDS can be flooded with metrics messages by clients and the lock contention in fast_dispatch is preventing the MDS from sending beacons to the monitors. This then leads to undesirable MDS failovers.

We should convert this to using regular dispatch (and optimize later if needed).

Related issues 1 (1 open — 0 closed)