Bug #46648
mds: cannot handle hundreds+ of subtrees
0%
Description
The MDS has a lot of trouble scaling to hundreds or thousands of subtrees. From discussions with Zheng, one of the reasons for that is the MDS needs to write the subtree map anytime it starts a new journal segment. That can cause long delays if the subtree map is large. It'd be more efficient to write out incremental changes to the subtree map as the MDS goes.
Additionally, there's various places in the MDS where we iterate over the subtrees and spam debug messages. Generally, information is useful but we should try to find ways to compact this down into fewer messages. Writing out all the subtrees to the debug log just does not scale along with the workload.
This ticket is part of a refactor Zheng has planned to take up.
Related issues
History
#1 Updated by Patrick Donnelly 6 months ago
I should add, it's trivial to set up a test for this: just create a distributed ephemeral pinned directory with large fan-out (~1000 sub-dirs). Or use manual pinning; it does not matter. Then try to do any kind of workload in any of the directories.
#2 Updated by Patrick Donnelly 6 months ago
- Related to Fix #46696: mds: pre-fragment distributed ephemeral pin directories to distribute the subtree bounds added
#3 Updated by Patrick Donnelly 4 months ago
- Category set to Performance/Resource Usage
- Status changed from New to In Progress
Zheng is currently working on this.
#4 Updated by Patrick Donnelly about 2 months ago
- Assignee deleted (
Zheng Yan)
#5 Updated by Patrick Donnelly 9 days ago
- Target version changed from v16.0.0 to v17.0.0
- Backport set to pacific,octopus,nautilus