Project

General

Profile

Actions

Feature #61866

open

MDSMonitor: require --yes-i-really-mean-it when failing an MDS with MDS_HEALTH_TRIM or MDS_HEALTH_CACHE_OVERSIZED health warnings

Added by Patrick Donnelly 11 months ago. Updated 5 days ago.

Status:
Pending Backport
Priority:
Immediate
Assignee:
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
Development
Tags:
backport_processed
Backport:
reef,quincy
Reviewed:
Affected Versions:
Component(FS):
MDSMonitor
Labels (FS):
Pull request ID:

Description

If an MDS is already having issues with getting behind on trimming its journal or an oversized cache, restarting it may only create new problems with very slow recovery. In particular, if the MDS gets very behind on trimming its journal with 1M or more segments, replay can take hours or longer.

We already track these warnings in MDSMonitor so do a simple check to help the operator or support folks not shoot themselves in the foot.


Related issues 3 (3 open0 closed)

Related to CephFS - Bug #65841: qa: dead job from `tasks.cephfs.test_admin.TestFSFail.test_with_health_warn_oversize_cache`Fix Under ReviewRishabh Dave

Actions
Copied to CephFS - Backport #65927: reef: MDSMonitor: require --yes-i-really-mean-it when failing an MDS with MDS_HEALTH_TRIM or MDS_HEALTH_CACHE_OVERSIZED health warningsNewRishabh DaveActions
Copied to CephFS - Backport #65928: quincy: MDSMonitor: require --yes-i-really-mean-it when failing an MDS with MDS_HEALTH_TRIM or MDS_HEALTH_CACHE_OVERSIZED health warningsNewRishabh DaveActions
Actions

Also available in: Atom PDF