Project

General

Profile

Actions

Bug #48673

open

High memory usage on standby replay MDS

Added by Daniel Persson over 3 years ago. Updated 6 months ago.

Status:
Pending Backport
Priority:
High
Category:
Performance/Resource Usage
Target version:
% Done:

0%

Source:
Community (user)
Tags:
backport_processed
Backport:
quincy,reef
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi.

We have recently installed a Ceph cluster and with about 27M objects. The filesystem seems to have 15M files.

The MDS is configured with a 20Gb mds_cache_memory_limit. If we look at the nodes, the memory keeps a bit above the limit on the active node 4 but not extremely so.

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
2165668 ceph      20   0   27.6g  26.1g  22088 S  12.3  13.9   2081:55 ceph-mds

However, we have problems with the standby replay node 3 with a large memory footprint.

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
2166195 ceph      20   0   40.7g  38.2g  21000 S   0.7  20.4  86:31.18 ceph-mds 

This level has remained constant for days. We have received warnings from the cluster reset a couple of times, even if the memory footprint has not changed.

[WARN] MDS_CACHE_OVERSIZED: 1 MDSs report oversized cache
        mdsnode3(mds.0): MDS cache is too large (30GB/20GB); 0 inodes in use by clients, 0 stray files

The nodes also run a couple of OSDs, and we don't want them to be affected now that we soon go for the Xmas holidays, so I thought I open a ticket here and see if we can get any suggestions on preventive measures from now on.

If you want any extra information, please ask.

Best regards
Daniel


Related issues 5 (1 open4 closed)

Related to CephFS - Bug #50048: mds: standby-replay only trims cache when it reaches the end of the replay logResolvedPatrick Donnelly

Actions
Related to CephFS - Bug #40213: mds: cannot switch mds state from standby-replay to active Resolvedsimon gao

Actions
Related to CephFS - Bug #50246: mds: failure replaying journal (EMetaBlob)ResolvedXiubo Li

Actions
Copied to CephFS - Backport #63675: quincy: High memory usage on standby replay MDSIn ProgressPatrick DonnellyActions
Copied to CephFS - Backport #63676: reef: High memory usage on standby replay MDSResolvedPatrick DonnellyActions
Actions

Also available in: Atom PDF