Project

General

Profile

Actions

Bug #6810

closed

very high monitor memory usage after upgrade dumpling -> emperor

Added by Corin Langosch over 10 years ago. Updated over 10 years ago.

Status:
Can't reproduce
Priority:
Urgent
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

As you know I upgraded a few days ago from dumpling to emperor. All deamons are now running emperor. I have 3 monitors and one has constantly growing in memory usage. It now uses approx 10GB ram.

root 21665 1.3 16.2 11050460 10736676 ? Sl Nov14 127:19 /usr/bin/ceph-mon -i c --pid-file /var/run/ceph/mon.c.pid -c /etc/ceph/ceph.conf

The other two monitors look fine:

root 9714 1.9 0.1 310160 79208 ? Sl Nov14 188:27 /usr/bin/ceph-mon -i a --pid-file /var/run/ceph/mon.a.pid -c /etc/ceph/ceph.conf
root 5424 1.5 0.2 413228 137848 ? Sl Nov14 142:47 /usr/bin/ceph-mon -i b --pid-file /var/run/ceph/mon.b.pid -c /etc/ceph/ceph.conf

Please let me know if you need anything to debug this, as I'd like to restart mon.c asap.


Files

ceph.tar.gz (431 KB) ceph.tar.gz Corin Langosch, 11/20/2013 09:06 AM
ceph-mon.heap (25.1 KB) ceph-mon.heap Corin Langosch, 11/21/2013 12:06 AM
Actions #1

Updated by Corin Langosch over 10 years ago

cluster 4ac0e21b-6ea2-4ac7-8114-122bd9ba55d6
health HEALTH_OK
monmap e5: 3 mons at {a=10.0.0.5:6789/0,b=10.0.0.6:6789/0,c=10.0.0.7:6789/0}, election epoch 146, quorum 0,1,2 a,b,c
osdmap e5370: 14 osds: 12 up, 12 in
pgmap v21289411: 12288 pgs, 3 pools, 2592 GB data, 657 kobjects
5217 GB used, 16527 GB / 21745 GB avail
12288 active+clean
client io 2027 B/s rd, 2730 kB/s wr, 72 op/s

Actions #2

Updated by Corin Langosch over 10 years ago

    cluster 4ac0e21b-6ea2-4ac7-8114-122bd9ba55d6
     health HEALTH_OK
     monmap e5: 3 mons at {a=10.0.0.5:6789/0,b=10.0.0.6:6789/0,c=10.0.0.7:6789/0}, election epoch 146, quorum 0,1,2 a,b,c
     osdmap e5370: 14 osds: 12 up, 12 in
      pgmap v21289411: 12288 pgs, 3 pools, 2592 GB data, 657 kobjects
            5217 GB used, 16527 GB / 21745 GB avail
               12288 active+clean
  client io 2027 B/s rd, 2730 kB/s wr, 72 op/s
Actions #3

Updated by Corin Langosch over 10 years ago

mon.ctcmalloc heap stats:------------------------------------------------
MALLOC:    10803178320 (10302.7 MiB) Bytes in use by application
MALLOC: +     13082624 (   12.5 MiB) Bytes in page heap freelist
MALLOC: +    100630232 (   96.0 MiB) Bytes in central cache freelist
MALLOC: +     15975936 (   15.2 MiB) Bytes in transfer cache freelist
MALLOC: +     23556056 (   22.5 MiB) Bytes in thread cache freelists
MALLOC: +     51736728 (   49.3 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =  11008159896 (10498.2 MiB) Actual memory used (physical + swap)
MALLOC: +      3063808 (    2.9 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =  11011223704 (10501.1 MiB) Virtual address space used
MALLOC:
MALLOC:         805571              Spans in use
MALLOC:            136              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the
Actions #4

Updated by Joao Eduardo Luis over 10 years ago

  • Assignee set to Joao Eduardo Luis

Can you please obtain a heap dump out of the monitor?

$ ceph heap start_profiler -m 10.0.0.7:6789

wait some time

$ ceph heap dump

dump should be in your log path and should be in the form 'ceph-mon.c.profile.XXXX.heap'

Also, if this happens to be something you can easily reproduce, please tell us how. If it happens often, running ceph-mon under valgrind with massif.

Once you decide to stop this mon, a store dump would also be appreciated. You can obtain it using 'ceph_test_store_tool' (renamed to 'ceph-kvstore-tool' under dumpling):

$ ceph_test_store_tool /var/lib/ceph/mon/ceph-c/store.db list > /tmp/ceph-mon.c.dump

Actions #5

Updated by Corin Langosch over 10 years ago

I just did what you wrote, please see attachment.

Actions #6

Updated by Joao Eduardo Luis over 10 years ago

Corin, forgot to ask: what version is this happening on exactly and are you using our packaged binaries?

Actions #7

Updated by Joao Eduardo Luis over 10 years ago

Corin, I forgot one step that would be wonderful if you could do: install google-perftools and run 'google-pprof <path-to-ceph-mon> --text mon.c.profile.*.heap > /tmp/ceph-mon.heap'

Otherwise I'd have to get an exact, or very similar, environment (lib-wise and version-wise) to make sense of those heap dumps.

Actions #8

Updated by Corin Langosch over 10 years ago

I use ceph version 0.72-3-g5e1e02c (5e1e02c99b620fa4ffd2b455eb8e005b172fa05c), which is the "hotfix" for http://tracker.ceph.com/issues/6761. But according to git history, this should be identical to 0.72.1.

The output of "google-pprof /usr/bin/ceph-mon --text mon.c.profile.*.heap > /tmp/ceph-mon.heap" is attached. Not sure if it really helps.

Btw, now some hours after restarting the monitor, the memory consumption seems still fine.

root      9714  1.9  0.1 314384 78928 ?        Sl   Nov14 208:18 /usr/bin/ceph-mon -i a --pid-file /var/run/ceph/mon.a.pid -c /etc/ceph/ceph.conf
root      5424  1.5  0.1 542196 127368 ?       Sl   Nov14 158:33 /usr/bin/ceph-mon -i b --pid-file /var/run/ceph/mon.b.pid -c /etc/ceph/ceph.conf
root     29466  1.3  0.1 261632 68464 ?        Sl   Nov20  11:51 /usr/bin/ceph-mon -i c --pid-file /var/run/ceph/mon.c.pid -c /etc/ceph/ceph.conf
Actions #9

Updated by Joao Eduardo Luis over 10 years ago

  • Status changed from New to In Progress
Actions #10

Updated by Sage Weil over 10 years ago

  • Status changed from In Progress to Can't reproduce
Actions

Also available in: Atom PDF