Bug #12681
closedtcmalloc doesn't release memory back to system
0%
Description
We observed high memory consumption by OSD daemons when we evaluating Hammer (0.94.2) in Yahoo! environment. Initially memory consumption per OSD was only about 1G (or even less) while it grow to almost 3G after 2 hours load test, which caused the system frozen/OOM (10 OSD run on one host with 32G physical memory) and finally some OSD processes were killed by system.
Below is the heap dump, almost 1.9G memory was occupied by heap freelist.
$ sudo ceph tell osd.32 heap stats
osd.32 tcmalloc heap stats:------------------------------------------------
MALLOC: 1217475232 ( 1161.1 MiB) Bytes in use by application
MALLOC: + 2031304704 ( 1937.2 MiB) Bytes in page heap freelist
MALLOC: + 49365752 ( 47.1 MiB) Bytes in central cache freelist
MALLOC: + 4159488 ( 4.0 MiB) Bytes in transfer cache freelist
MALLOC: + 56619624 ( 54.0 MiB) Bytes in thread cache freelists
MALLOC: + 14528664 ( 13.9 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 3373453464 ( 3217.2 MiB) Actual memory used (physical + swap)
MALLOC: + 4677632 ( 4.5 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 3378131096 ( 3221.6 MiB) Virtual address space used
MALLOC:
MALLOC: 135722 Spans in use
MALLOC: 1152 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------
After run "heap release" the 1.9G memory was able to be released to system.
$ sudo ceph tell osd.32 heap release
osd.32 releasing free RAM back to system.
$ sudo ceph tell osd.32 heap stats
osd.32 tcmalloc heap stats:------------------------------------------------
MALLOC: 1211800264 ( 1155.7 MiB) Bytes in use by application
MALLOC: + 286720 ( 0.3 MiB) Bytes in page heap freelist
MALLOC: + 54191184 ( 51.7 MiB) Bytes in central cache freelist
MALLOC: + 7419648 ( 7.1 MiB) Bytes in transfer cache freelist
MALLOC: + 54217192 ( 51.7 MiB) Bytes in thread cache freelists
MALLOC: + 14528664 ( 13.9 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 1342443672 ( 1280.3 MiB) Actual memory used (physical + swap)
MALLOC: + 2035687424 ( 1941.4 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 3378131096 ( 3221.6 MiB) Virtual address space used
MALLOC:
MALLOC: 135729 Spans in use
MALLOC: 1148 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------
Updated by Sage Weil over 8 years ago
Was any recovery taking place? This could be this bug: https://github.com/ceph/ceph/pull/5451
Updated by Xiaogang Chang over 8 years ago
No recovery ongoing, I just started the service and put load for 1~2 hours, the run into this situation. There is no any OSD map changed during the test.
Updated by Sage Weil over 8 years ago
- Subject changed from High memory consumption by Hammer OSD daemons to tcmalloc doesn't release memory back to system
We're hping to switch to jemalloc... tcmalloc is annoying in many other ways as well.
Updated by Greg Farnum over 8 years ago
We usually see this happening with specific releases of tcmalloc and host operating systems, although it's not something that's been well-quantified yet. What distro and tcmalloc version is in use?
Updated by Xiaogang Chang over 8 years ago
Hi, Greg
I'm using RHEL-6 with kernel 2.6.32, the tcmalloc is 4.1.0 with gperftools-2.0. I use the same OS and tcmalloc with Giant while there is no issue on Giant.
Updated by Greg Farnum over 8 years ago
Do you have any special config settings, or is everything on the default?
Updated by Greg Farnum over 8 years ago
And what's the source for that libtcmalloc release?
Updated by Sage Weil over 8 years ago
- Status changed from Need More Info to Won't Fix
This is definitely a tcmalloc issue.. I think the place to go is rhel6 support? Perhaps there is a backport for a newer perftools available?
As a workaround, you might do the heap release command via cron or something...
Updated by c sights over 8 years ago
Possibly ceph OSDs could watch their free list (or used swap on the node) and tell themselves to free?
Updated by Star Guo over 8 years ago
https://github.com/kuszmaul/SuperMalloc seem precent well performance.