Project

General

Profile

Actions

Feature #61778

open

mgr/mds_partitioner: add MDS partitioner module in MGR

Added by Yongseok Oh 11 months ago. Updated 10 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
MDS
Labels (FS):
multimds
Pull request ID:

Description

This idea is based on our presentation in Cephalocon2023. (Please refer to the slides in https://static.sched.com/hosted_files/ceph2023/d9/Cephalocon2023_LINE_Optimizing_CephFS.pdf.)
We have presented our in-house partitioner written in Python along with bal_rank_mask. Specifically, we employ combining approach to selectively use dynamic partitioning to handle heavy workloads (e.g., huge files and working set size) and static partitioning to process light/moderate workloads. We also distribute subdirs based on workload characteristics. Compared to typical static pinning, we can balance metadata workloads with minimizing metadata movements across MDSs. Unfortunately, our in-house partitioner is unavailable as open source because it is optimized for our environment. Therefore, it needs to be revised and reimplemented as a MGR module for Ceph community.

Here is summary of our mds_partitioner module.

Enable mds_partitioner
$ ceph mgr module enable mds_partitioner

Analyze client workloads ontained from MDSs
$ ceph mds_partitioner analyze start

Report analysis results and recommend optimal the number of MDSs
$ ceph mds_partitioner analyze status

Start partitioning
$ ceph mds_partitioner partition start

Report partitioning status
$ ceph mds_partitioner partition status

Partition module can be executed through `ceph mgr module enable mds_partitioner`. Executing `ceph mds_partitioner analyze` starts to distribute subdirs to multiple MDSs according to workloads. Then, in order to calculate the optimal distribution, metrics such as perf, rentries, and wss are obtained from the MDS balancer. After that, a bin packing algorithm is used to determine the MDS placement of subdirs. The wss tracker will be implemented in the future as needed. Then, through `ceph mds_partitioner analyze status`, we can confirm analysis results and how to distribute subdirs to MDSs. After that, actual partitioning is executed through `ceph mds_partitioner partition start`. To move subdirs, ceph.dir.pin and ceph.dir.bal.mask vxattr are simply employed. Additionally, ceph.dir.bal.mask needs to be implemented. See tracker https://tracker.ceph.com/issues/61777. Finally, you can check the partitioning progress using `ceph mds_partitioner partition status`.

Please refer to additional slides for detailed information. https://github.com/yongseokoh/presentation/blob/main/A_New_Parititioning_for_CephFS.pdf


Subtasks 1 (1 open0 closed)

Bug #62158: mds: quick suspend or abort metadata migrationNew

Actions

Related issues 2 (2 open0 closed)

Related to CephFS - Tasks #62159: qa: evaluate mds_partitionerIn ProgressYongseok Oh

Actions
Related to CephFS - Feature #62157: mds: working set size trackerIn ProgressYongseok Oh

Actions
Actions

Also available in: Atom PDF