Project

General

Profile

Actions

Bug #65658

open

mds: MetricAggregator::ms_can_fast_dispatch2 acquires locks

Added by Patrick Donnelly 10 days ago. Updated 10 days ago.

Status:
Fix Under Review
Priority:
High
Category:
Performance/Resource Usage
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
squid,reef,quincy
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

There was a lot of discussion surrounding this in

https://github.com/ceph/ceph/pull/26004/

but circling back we have since seen evidence this is causing significant problems: after a long up:replay recovery, the MDS can be flooded with metrics messages by clients and the lock contention in fast_dispatch is preventing the MDS from sending beacons to the monitors. This then leads to undesirable MDS failovers.

We should convert this to using regular dispatch (and optimize later if needed).


Related issues 1 (1 open0 closed)

Related to CephFS - Feature #65637: mds: continue sending heartbeats during recovery when MDS journal is largeNew

Actions
Actions #1

Updated by Patrick Donnelly 10 days ago

  • Description updated (diff)
Actions #2

Updated by Patrick Donnelly 10 days ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 57081
Actions #3

Updated by Patrick Donnelly 10 days ago

  • Related to Feature #65637: mds: continue sending heartbeats during recovery when MDS journal is large added
Actions

Also available in: Atom PDF