Bug #47682
MDS can't release caps faster than clients taking caps
0%
Description
We have a workload in which a kernel client is stat'ing all files in an FS. This workload triggered a few issues:
1. The mds memory usage grew well above the 4GB mds_cache_memory_limit, to 16GB. One client had 1M caps at that point, and the other clients had several hundred K each:
2020-09-29 09:00:22.993 7f26c8054700 2 mds.0.cache Memory usage: total 17912352, rss 17019028, heap 332036, baseline 332036, 3185001 / 3273427 inodes have caps, 3185107 caps, 0.973019 caps per inode
2. We increased the mds_cache_memory_limit to 16GB to silence the "MDS cache oversized" warning.
3. The clients did `echo 2 > /proc/sys/vm/drop_caches`, after which the num caps dropped back to a reasonable low number.
2020-09-29 09:44:33.556 7f26c8054700 2 mds.0.cache Memory usage: total 19835760, rss 18776188, heap 332036, baseline 332036, 59011 / 4527518 inodes have caps, 59022 caps, 0.0130363 caps per inode
4. We decreased the mds_cache_memory_limit back to 8GB, expecting the memory usage to decrease. Cached inodes decreased, but RSS did not:
2020-09-29 09:46:23.953 7f26c8054700 2 mds.0.cache Memory usage: total 19835760, rss 17850188, heap 332036, baseline 332036, 48453 / 2150423 inodes have caps, 48464 caps, 0.022537 caps per inode
5. We found that the memory is all stuck in the tcmalloc central cache freelist:
mds.cephflash20-01cbf24286 tcmalloc heap stats:------------------------------------------------ MALLOC: 8937379976 ( 8523.3 MiB) Bytes in use by application MALLOC: + 32768 ( 0.0 MiB) Bytes in page heap freelist MALLOC: + 9169491864 ( 8744.7 MiB) Bytes in central cache freelist MALLOC: + 146176 ( 0.1 MiB) Bytes in transfer cache freelist MALLOC: + 33780960 ( 32.2 MiB) Bytes in thread cache freelists MALLOC: + 117702656 ( 112.2 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 18258534400 (17412.7 MiB) Actual memory used (physical + swap) MALLOC: + 1563828224 ( 1491.4 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 19822362624 (18904.1 MiB) Virtual address space used MALLOC: MALLOC: 1864499 Spans in use MALLOC: 22 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory.
`heap release` does not free up that memory.
How do we release the memory in the `central cache freelist`? Generally, is there something we can do to avoid this ?
Related issues
History
#1 Updated by Dan van der Ster almost 3 years ago
- Subject changed from MDS huge amount of memory in central cache freelist to MDS can't release caps faster than clients taking caps
- the central cache freelist eventually decreases after an hour or so.
- I suppose the bigger issue is that the fixes in #41141 are not sufficient for this type of workload. If indeed we stop the client (scanning all the files) then the caps are recalled quite slowly and eventually the memory usage comes back to normal.
So, is there something we can do to speed up #41141 even further? We tried setting `mds recall max caps` to 10k, which helped a bit. Then we tried 100k, after which it seemed to stop recalling caps.
The above client is taking something like 30k caps/second: can we recall at that rate?
#2 Updated by Dan van der Ster almost 3 years ago
Our current config is:
mds_recall_global_max_decay_threshold 200000
mds_recall_max_decay_threshold 100000
mds_recall_max_caps 50000
Is that reaching the limits of caps recall?
#3 Updated by Dan van der Ster almost 3 years ago
- Status changed from New to Rejected
with more effective tuning I think we can manage. cancelling this ticket.
#4 Updated by Patrick Donnelly almost 3 years ago
Dan, see: #47307
#5 Updated by Patrick Donnelly almost 3 years ago
- Duplicates Bug #47307: mds: throttle workloads which acquire caps faster than the client can release added