Bug #6810
closedvery high monitor memory usage after upgrade dumpling -> emperor
0%
Description
As you know I upgraded a few days ago from dumpling to emperor. All deamons are now running emperor. I have 3 monitors and one has constantly growing in memory usage. It now uses approx 10GB ram.
root 21665 1.3 16.2 11050460 10736676 ? Sl Nov14 127:19 /usr/bin/ceph-mon -i c --pid-file /var/run/ceph/mon.c.pid -c /etc/ceph/ceph.conf
The other two monitors look fine:
root 9714 1.9 0.1 310160 79208 ? Sl Nov14 188:27 /usr/bin/ceph-mon -i a --pid-file /var/run/ceph/mon.a.pid -c /etc/ceph/ceph.conf
root 5424 1.5 0.2 413228 137848 ? Sl Nov14 142:47 /usr/bin/ceph-mon -i b --pid-file /var/run/ceph/mon.b.pid -c /etc/ceph/ceph.conf
Please let me know if you need anything to debug this, as I'd like to restart mon.c asap.
Files
Updated by Corin Langosch over 10 years ago
cluster 4ac0e21b-6ea2-4ac7-8114-122bd9ba55d6
health HEALTH_OK
monmap e5: 3 mons at {a=10.0.0.5:6789/0,b=10.0.0.6:6789/0,c=10.0.0.7:6789/0}, election epoch 146, quorum 0,1,2 a,b,c
osdmap e5370: 14 osds: 12 up, 12 in
pgmap v21289411: 12288 pgs, 3 pools, 2592 GB data, 657 kobjects
5217 GB used, 16527 GB / 21745 GB avail
12288 active+clean
client io 2027 B/s rd, 2730 kB/s wr, 72 op/s
Updated by Corin Langosch over 10 years ago
cluster 4ac0e21b-6ea2-4ac7-8114-122bd9ba55d6 health HEALTH_OK monmap e5: 3 mons at {a=10.0.0.5:6789/0,b=10.0.0.6:6789/0,c=10.0.0.7:6789/0}, election epoch 146, quorum 0,1,2 a,b,c osdmap e5370: 14 osds: 12 up, 12 in pgmap v21289411: 12288 pgs, 3 pools, 2592 GB data, 657 kobjects 5217 GB used, 16527 GB / 21745 GB avail 12288 active+clean client io 2027 B/s rd, 2730 kB/s wr, 72 op/s
Updated by Corin Langosch over 10 years ago
mon.ctcmalloc heap stats:------------------------------------------------ MALLOC: 10803178320 (10302.7 MiB) Bytes in use by application MALLOC: + 13082624 ( 12.5 MiB) Bytes in page heap freelist MALLOC: + 100630232 ( 96.0 MiB) Bytes in central cache freelist MALLOC: + 15975936 ( 15.2 MiB) Bytes in transfer cache freelist MALLOC: + 23556056 ( 22.5 MiB) Bytes in thread cache freelists MALLOC: + 51736728 ( 49.3 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 11008159896 (10498.2 MiB) Actual memory used (physical + swap) MALLOC: + 3063808 ( 2.9 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 11011223704 (10501.1 MiB) Virtual address space used MALLOC: MALLOC: 805571 Spans in use MALLOC: 136 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the
Updated by Joao Eduardo Luis over 10 years ago
- Assignee set to Joao Eduardo Luis
Can you please obtain a heap dump out of the monitor?
$ ceph heap start_profiler -m 10.0.0.7:6789
wait some time
$ ceph heap dump
dump should be in your log path and should be in the form 'ceph-mon.c.profile.XXXX.heap'
Also, if this happens to be something you can easily reproduce, please tell us how. If it happens often, running ceph-mon under valgrind with massif.
Once you decide to stop this mon, a store dump would also be appreciated. You can obtain it using 'ceph_test_store_tool' (renamed to 'ceph-kvstore-tool' under dumpling):
$ ceph_test_store_tool /var/lib/ceph/mon/ceph-c/store.db list > /tmp/ceph-mon.c.dump
Updated by Corin Langosch over 10 years ago
- File ceph.tar.gz ceph.tar.gz added
I just did what you wrote, please see attachment.
Updated by Joao Eduardo Luis over 10 years ago
Corin, forgot to ask: what version is this happening on exactly and are you using our packaged binaries?
Updated by Joao Eduardo Luis over 10 years ago
Corin, I forgot one step that would be wonderful if you could do: install google-perftools and run 'google-pprof <path-to-ceph-mon> --text mon.c.profile.*.heap > /tmp/ceph-mon.heap'
Otherwise I'd have to get an exact, or very similar, environment (lib-wise and version-wise) to make sense of those heap dumps.
Updated by Corin Langosch over 10 years ago
- File ceph-mon.heap ceph-mon.heap added
I use ceph version 0.72-3-g5e1e02c (5e1e02c99b620fa4ffd2b455eb8e005b172fa05c), which is the "hotfix" for http://tracker.ceph.com/issues/6761. But according to git history, this should be identical to 0.72.1.
The output of "google-pprof /usr/bin/ceph-mon --text mon.c.profile.*.heap > /tmp/ceph-mon.heap" is attached. Not sure if it really helps.
Btw, now some hours after restarting the monitor, the memory consumption seems still fine.
root 9714 1.9 0.1 314384 78928 ? Sl Nov14 208:18 /usr/bin/ceph-mon -i a --pid-file /var/run/ceph/mon.a.pid -c /etc/ceph/ceph.conf root 5424 1.5 0.1 542196 127368 ? Sl Nov14 158:33 /usr/bin/ceph-mon -i b --pid-file /var/run/ceph/mon.b.pid -c /etc/ceph/ceph.conf root 29466 1.3 0.1 261632 68464 ? Sl Nov20 11:51 /usr/bin/ceph-mon -i c --pid-file /var/run/ceph/mon.c.pid -c /etc/ceph/ceph.conf
Updated by Joao Eduardo Luis over 10 years ago
- Status changed from New to In Progress
Updated by Sage Weil over 10 years ago
- Status changed from In Progress to Can't reproduce