Project

General

Profile

Actions

Feature #40929

closed

pybind/mgr/mds_autoscaler: create mgr plugin to deploy and configure MDSs in response to degraded file system

Added by Patrick Donnelly almost 5 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
Labels (FS):
task(intern), task(medium)
Pull request ID:

Description

Create a new mgr plugin that adds and removes MDSs in response to degraded file systems. The plugin should monitor changes to the FSMap and look for any file system that requires more standbys. This would be indicated by FS_DEGRADED, MDS_UP_LESS_THAN_MAX, or MDS_INSUFFICIENT_STANDBY. These warnings are not in the FSMap. The same logic for checking these conditions in FSMap.cc/MDSMap.cc needs replicated in the mgr plugin.

The name "mds_autoscaler" is subject to change but I've chosen it to coincide with pg_autoscaler. The end-intent may be the same: the mds_autoscaler plugin may also monitor file system load to increase max_mds according to some ruleset. Then it would simultaneously launch more MDS to fill those ranks. To start, this ticket is simply focused on launching MDS in response to degraded file systems or file systems with insufficient standbys.

The volumes plugin should no longer create an MDS for each volume. The fix for this ticket should also remove that behavior.

Note, [1] also introduces the ability to assign a new MDS to a particular file system. This plugin should plan to use that behavior.

[1] https://github.com/ceph/ceph/pull/32015


Related issues 2 (1 open1 closed)

Related to CephFS - Fix #46885: pybind/mgr/mds_autoscaler: add test for MDS scaling with cephadmNewMilind Changire

Actions
Related to CephFS - Documentation #46884: pybind/mgr/mds_autoscaler: add documentationResolvedMilind Changire

Actions
Actions #1

Updated by Milind Changire over 4 years ago

  • Assignee changed from Ramana Raja to Milind Changire
Actions #2

Updated by Milind Changire over 4 years ago

  • Status changed from New to In Progress
Actions #3

Updated by Patrick Donnelly over 4 years ago

  • Subject changed from mgr/volumes: create extension to deploy and configure MDSs in response to degraded file system to pybind/mgr/mds_autoscaler: create mgr plugin to deploy and configure MDSs in response to degraded file system
  • Description updated (diff)
  • Category set to Administration/Usability
  • Backport deleted (nautilus)
  • Component(FS) deleted (mgr/volumes)
  • Labels (FS) task(intern) added
Actions #4

Updated by Milind Changire over 4 years ago

  • Pull request ID set to 32731
Actions #5

Updated by Patrick Donnelly over 4 years ago

  • Target version changed from v15.0.0 to v16.0.0
Actions #6

Updated by Patrick Donnelly over 3 years ago

  • Status changed from In Progress to Resolved
Actions #7

Updated by Patrick Donnelly over 3 years ago

  • Related to Fix #46885: pybind/mgr/mds_autoscaler: add test for MDS scaling with cephadm added
Actions #8

Updated by Patrick Donnelly over 3 years ago

Actions

Also available in: Atom PDF