Project

General

Profile

Actions

Bug #47682

closed

MDS can't release caps faster than clients taking caps

Added by Dan van der Ster over 3 years ago. Updated over 3 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have a workload in which a kernel client is stat'ing all files in an FS. This workload triggered a few issues:

1. The mds memory usage grew well above the 4GB mds_cache_memory_limit, to 16GB. One client had 1M caps at that point, and the other clients had several hundred K each:

2020-09-29 09:00:22.993 7f26c8054700  2 mds.0.cache Memory usage:  total 17912352, rss 17019028, heap 332036, baseline 332036, 3185001 / 3273427 inodes have caps, 3185107 caps, 0.973019 caps per inode

2. We increased the mds_cache_memory_limit to 16GB to silence the "MDS cache oversized" warning.
3. The clients did `echo 2 > /proc/sys/vm/drop_caches`, after which the num caps dropped back to a reasonable low number.
2020-09-29 09:44:33.556 7f26c8054700  2 mds.0.cache Memory usage:  total 19835760, rss 18776188, heap 332036, baseline 332036, 59011 / 4527518 inodes have caps, 59022 caps, 0.0130363 caps per inode

4. We decreased the mds_cache_memory_limit back to 8GB, expecting the memory usage to decrease. Cached inodes decreased, but RSS did not:

2020-09-29 09:46:23.953 7f26c8054700  2 mds.0.cache Memory usage:  total 19835760, rss 17850188, heap 332036, baseline 332036, 48453 / 2150423 inodes have caps, 48464 caps, 0.022537 caps per inode

5. We found that the memory is all stuck in the tcmalloc central cache freelist:
mds.cephflash20-01cbf24286 tcmalloc heap stats:------------------------------------------------
MALLOC:     8937379976 ( 8523.3 MiB) Bytes in use by application
MALLOC: +        32768 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +   9169491864 ( 8744.7 MiB) Bytes in central cache freelist
MALLOC: +       146176 (    0.1 MiB) Bytes in transfer cache freelist
MALLOC: +     33780960 (   32.2 MiB) Bytes in thread cache freelists
MALLOC: +    117702656 (  112.2 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =  18258534400 (17412.7 MiB) Actual memory used (physical + swap)
MALLOC: +   1563828224 ( 1491.4 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =  19822362624 (18904.1 MiB) Virtual address space used
MALLOC:
MALLOC:        1864499              Spans in use
MALLOC:             22              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

`heap release` does not free up that memory.

How do we release the memory in the `central cache freelist`? Generally, is there something we can do to avoid this ?


Related issues 1 (0 open1 closed)

Is duplicate of CephFS - Bug #47307: mds: throttle workloads which acquire caps faster than the client can releaseResolvedKotresh Hiremath Ravishankar

Actions
Actions #1

Updated by Dan van der Ster over 3 years ago

  • Subject changed from MDS huge amount of memory in central cache freelist to MDS can't release caps faster than clients taking caps
Update:
  • the central cache freelist eventually decreases after an hour or so.
  • I suppose the bigger issue is that the fixes in #41141 are not sufficient for this type of workload. If indeed we stop the client (scanning all the files) then the caps are recalled quite slowly and eventually the memory usage comes back to normal.

So, is there something we can do to speed up #41141 even further? We tried setting `mds recall max caps` to 10k, which helped a bit. Then we tried 100k, after which it seemed to stop recalling caps.
The above client is taking something like 30k caps/second: can we recall at that rate?

Actions #2

Updated by Dan van der Ster over 3 years ago

Our current config is:

mds_recall_global_max_decay_threshold 200000
mds_recall_max_decay_threshold 100000
mds_recall_max_caps 50000

Is that reaching the limits of caps recall?

Actions #3

Updated by Dan van der Ster over 3 years ago

  • Status changed from New to Rejected

with more effective tuning I think we can manage. cancelling this ticket.

Actions #4

Updated by Patrick Donnelly over 3 years ago

Dan, see: #47307

Actions #5

Updated by Patrick Donnelly over 3 years ago

  • Is duplicate of Bug #47307: mds: throttle workloads which acquire caps faster than the client can release added
Actions

Also available in: Atom PDF