Project

General

Profile

Actions

Bug #37980

open

luminous: osd memery use very high,and missmatch between res and heap stats

Added by zhou yang over 5 years ago. Updated over 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph 12.2.1
3 nodes, 30 osds per node
ec pool:4+2

After running for 2 months,we find some osds memery use very high in top lists,4-5G,and the heap stats like this:

top - 10:45:01 up 73 days, 1:20, 1 user, load average: 10.26, 9.74, 9.95
Tasks: 657 total, 3 running, 654 sleeping, 0 stopped, 0 zombie
%Cpu(s): 6.6 us, 6.8 sy, 0.0 ni, 82.7 id, 3.7 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem: 65325552 total, 64448092 used, 877460 free, 90120 buffers
KiB Swap: 0 total, 0 used, 0 free. 446164 cached Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
18385 ceph 20 0 7369196 5.114g 7232 S 6.1 8.2 2873:28 /usr/bin/ceph-osd -f --cluster ceph --id 61 --setuser ceph --setg+

osd.61 tcmalloc heap stats:

MALLOC: 2239198296 ( 2135.5 MiB) Bytes in use by application
MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist
MALLOC: + 72369432 ( 69.0 MiB) Bytes in central cache freelist
MALLOC: + 13839792 ( 13.2 MiB) Bytes in transfer cache freelist
MALLOC: + 104315104 ( 99.5 MiB) Bytes in thread cache freelists
MALLOC: + 25096352 ( 23.9 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 2454818976 ( 2341.1 MiB) Actual memory used (physical + swap)
MALLOC: + 4095991808 ( 3906.2 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 6550810784 ( 6247.3 MiB) Virtual address space used
MALLOC:
MALLOC: 136858 Spans in use
MALLOC: 63 Thread heaps in use
MALLOC: 8192 Tcmalloc page size

RES shows osd.61 memery use 5G+,but heap stats "Actual memory used" just 2G+,and we find that the osds with high res also have high "Bytes released to OS".
After we restart the osd, the memery use is released.
Has anyone encountered a similar problem?

Actions #1

Updated by Igor Fedotov over 5 years ago

Are you using FileStore or BlueStore?

Actions #2

Updated by Igor Fedotov over 5 years ago

And what OS are you using?

Actions #3

Updated by Greg Farnum over 5 years ago

  • Project changed from Ceph to RADOS
Actions #4

Updated by zhou yang over 5 years ago

I am using bluestore, and my client is rbd with ec datapool.
The cluster is running on Centos 7.0.1406, tcmalloc version is 4.2.6 .

Actions #5

Updated by Nathan Cutler over 5 years ago

ceph 12.2.1

Are you really running that version, 12.2.1 ?

Actions #6

Updated by Mark Nelson over 5 years ago

Hi,

Often times this kind of thing is related to transparent huge pages. There definitely seems to be different kinds of behavior on different kernels from what I've seen. There's a higher level tcmalloc issue for this here (not ceph related):

https://github.com/gperftools/gperftools/issues/990

I'd try either disabling THP or setting max_ptes_none to 0 as reported in that issue and see if that helps. I'm pretty sure I've done that with Ceph in the past and seen improvements in the behavior when this has been a problem.

Actions #7

Updated by zhou yang over 5 years ago

Thanks a lot.

disabling THP or setting max_ptes_none to 0

I will try this later and see if that helps. Since it can not reproduce in short time,I will keep track and report the result.

Actions

Also available in: Atom PDF