Project

General

Profile

Actions

Bug #21402

closed

mds: move remaining containers in CDentry/CDir/CInode to mempool

Added by Patrick Donnelly over 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Category:
Introspection/Control
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This commit:

https://github.com/ceph/ceph/commit/e035b64fcb0482c3318656e1680d683814f494fe

does only part of the work. There are still containers which need to be moved to the mempool so the cache memory usage is more accurate.


Files

patched-memcache.png (555 KB) patched-memcache.png Patrick Donnelly, 01/30/2018 05:28 PM
master-memcache.png (473 KB) master-memcache.png Patrick Donnelly, 01/30/2018 05:28 PM
patched-memcache.png (367 KB) patched-memcache.png Patrick Donnelly, 02/01/2018 04:28 AM
master-memcache.png (382 KB) master-memcache.png Patrick Donnelly, 02/01/2018 04:28 AM

Related issues 5 (2 open3 closed)

Related to CephFS - Bug #20594: mds: cache limits should be expressed in memory usage, not inode countResolvedPatrick Donnelly07/12/2017

Actions
Related to CephFS - Documentation #22599: doc: mds memory tracking of cache is imprecise by a constant factorResolvedPatrick Donnelly09/15/2017

Actions
Related to CephFS - Bug #22962: mds: move remaining containers in CDentry/CDir/CInode to mempool (cont.)New09/15/2017

Actions
Copied to CephFS - Bug #22962: mds: move remaining containers in CDentry/CDir/CInode to mempool (cont.)New09/15/2017

Actions
Copied to CephFS - Backport #22972: luminous: mds: move remaining containers in CDentry/CDir/CInode to mempoolResolvedPatrick DonnellyActions
Actions #1

Updated by Patrick Donnelly over 6 years ago

  • Related to Backport #21384: luminous: mds: cache limits should be expressed in memory usage, not inode count added
Actions #2

Updated by Patrick Donnelly over 6 years ago

  • Related to deleted (Backport #21384: luminous: mds: cache limits should be expressed in memory usage, not inode count)
Actions #3

Updated by Patrick Donnelly over 6 years ago

  • Related to Bug #20594: mds: cache limits should be expressed in memory usage, not inode count added
Actions #4

Updated by Patrick Donnelly over 6 years ago

  • Assignee set to Patrick Donnelly
Actions #5

Updated by Patrick Donnelly over 6 years ago

  • Copied to Documentation #22599: doc: mds memory tracking of cache is imprecise by a constant factor added
Actions #6

Updated by Patrick Donnelly over 6 years ago

  • Related to Documentation #22599: doc: mds memory tracking of cache is imprecise by a constant factor added
Actions #7

Updated by Patrick Donnelly over 6 years ago

  • Status changed from New to In Progress
Actions #8

Updated by Nathan Cutler over 6 years ago

  • Copied to deleted (Documentation #22599: doc: mds memory tracking of cache is imprecise by a constant factor)

Updated by Patrick Donnelly about 6 years ago

https://github.com/ceph/ceph/pull/19954

Also ran two 64-client kernel build tests (one patched, one master) with a single active MDS with 24GB of RAM and `mds cache memory limit = 8GB`. Both graphs attached.

The patched version is 33% closer to the true RSS (30% smaller than RSS to 20%). Of course, it'll never be exactly the RSS because the MDS uses RAM for other purposes.

Other good news: apparently this also reduces RAM use by the MDS by 10% at an 8GB cache. I'm thinking about doing another test with a much larger cache to see the benefits there. Edit: the reason for this would be that many member containers no longer use indirect references (where possible) to other structures. i.e. instead of map<foo, bar*>, we have map<foo, bar>.

Updated by Patrick Donnelly about 6 years ago

64GB cache size limit experiment attached.

The master branch was tested with 64 kernel clients each building the kernel 4 times (so 256 parallel kernel builds). Afterwards, I realized it wasn't going to fill the cache so I killed it but the larger cache usage is still useful for some comparison.

The patched-memcache.png shows the memory usage for the patched version. This has 64 kernel clients and 512 parallel kernel builds.

Conclusions: The % of RSS remains stable for the cache size as its grows around ~20% for the patched case and ~35% for the master branch. That would explain the ~50% overuse of RAM ((1-.35)^-1 = 1.53) reported on the mailing list. With the patched version, we should see about ~25% instead ((1-.20)^-1 = 1.25).

Also worth noting, again, the memory usage of the MDS goes down. For comparative purposes, I selected a time when the cache in master was at its max and compared it to (nearly) the same cache size in the patched version. For a 35.8GB cache, the master branch uses a RSS of 54.3GB. For the patched branch at the same cache size, the RSS is 45.7GB. That gives us a reduction of about 15.8%!

There are still a small number of containers in the cache that need mvoed to the mempool. I've noted them in the PR with FIXMEs. We should revisit this when there is more time available.

Actions #11

Updated by Patrick Donnelly about 6 years ago

It occurred to me I wasn't comparing apples to apples when doing the memory reduction comparisons. I looked at the same data using inode count instead and the memory usage is about the same. No real change.

Actions #12

Updated by Patrick Donnelly about 6 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #13

Updated by Patrick Donnelly about 6 years ago

  • Copied to Bug #22962: mds: move remaining containers in CDentry/CDir/CInode to mempool (cont.) added
Actions #14

Updated by Patrick Donnelly about 6 years ago

  • Related to Bug #22962: mds: move remaining containers in CDentry/CDir/CInode to mempool (cont.) added
Actions #15

Updated by Nathan Cutler about 6 years ago

  • Copied to Backport #22972: luminous: mds: move remaining containers in CDentry/CDir/CInode to mempool added
Actions #16

Updated by Patrick Donnelly about 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF