Project

General

Profile

Bug #46648

mds: cannot handle hundreds+ of subtrees

Added by Patrick Donnelly 3 months ago. Updated 12 days ago.

Status:
In Progress
Priority:
High
Assignee:
Category:
Performance/Resource Usage
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
task(hard)
Pull request ID:
Crash signature:

Description

The MDS has a lot of trouble scaling to hundreds or thousands of subtrees. From discussions with Zheng, one of the reasons for that is the MDS needs to write the subtree map anytime it starts a new journal segment. That can cause long delays if the subtree map is large. It'd be more efficient to write out incremental changes to the subtree map as the MDS goes.

Additionally, there's various places in the MDS where we iterate over the subtrees and spam debug messages. Generally, information is useful but we should try to find ways to compact this down into fewer messages. Writing out all the subtrees to the debug log just does not scale along with the workload.

This ticket is part of a refactor Zheng has planned to take up.


Related issues

Related to fs - Fix #46696: mds: pre-fragment distributed ephemeral pin directories to distribute the subtree bounds Resolved

History

#1 Updated by Patrick Donnelly 3 months ago

I should add, it's trivial to set up a test for this: just create a distributed ephemeral pinned directory with large fan-out (~1000 sub-dirs). Or use manual pinning; it does not matter. Then try to do any kind of workload in any of the directories.

#2 Updated by Patrick Donnelly 3 months ago

  • Related to Fix #46696: mds: pre-fragment distributed ephemeral pin directories to distribute the subtree bounds added

#3 Updated by Patrick Donnelly 12 days ago

  • Category set to Performance/Resource Usage
  • Status changed from New to In Progress

Zheng is currently working on this.

Also available in: Atom PDF