Project

General

Profile

Actions

Fix #62712

open

pybind/mgr/volumes: implement EAGAIN logic for clearing request queue when under load

Added by Patrick Donnelly 9 months ago. Updated 11 days ago.

Status:
New
Priority:
Normal
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
reef,quincy
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Even with the recent changes to the ceph-mgr (#51177) to have a separate finisher thread for each module, the requests queued in the volumes plugin finisher thread may still occupy all the slots for the ceph-mgr throttles. This will eventually result in a similar DoS where the volumes plugin's slowness or bugs may result in larger-scale cluster issues.

A possible way to resolve this in a durable way is for the DaemonServer to skip queueing the command if the mod_finisher queue is past some threshold. Instead, return `EAGAIN` as an error from the module.

Beyond this being a fairly simple structural change, it'd be important to check ceph-csi is equipped to handle these types of retry errors from the storage backend.


Related issues 1 (1 open0 closed)

Related to CephFS - Feature #59714: mgr/volumes: Support to reject CephFS clones if cloner threads are not availablePending BackportNeeraj Pratap Singh

Actions
Actions

Also available in: Atom PDF