Project

General

Profile

Actions

Bug #1349

closed

mds: standby-replay leaks memory

Added by Sage Weil over 12 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When i do 'CEPH_NUM_MDS=4 ./vstart.sh -d -n -x -s' and cfuse and 'fsstress -d foo -l 1 -n 1000 -p 10 -v' i see the standby mds's eat memory like crazy (i currently have 4 21GB cmds procs).

I suspect you'll see the same thing with 1 mds and a simpler workload, but I haven't checked.

Actions #1

Updated by Greg Farnum over 12 years ago

I don't think this is actually a memory leak. On kai I ran a 1-MDS system with standby and did snaptest-2. The standby's virtual memory rapidly climbed but its resident stayed well below the active's even before I ran out of system memory, and trying it with heap profiling on indicated the heap was pretty small. Unless we've somehow gotten ourselves into a situation where it's got infinite stack recursion, which I think we looked at before and dismissed as a possibility.
I imagine it's just a result of decoding on-disk state frequently.

Unless you're seeing it with real RSS troubles?

Actions #2

Updated by Greg Farnum over 12 years ago

For instance: (the one on top is the standby; the workload finished)

27490 gregf     20   0 2927m  51m 3168 S    0  1.3   0:06.60 cmds
27477 gregf     20   0  140m  51m 3912 S    0  1.3   1:24.16 cmds

./ceph mds tell \* heap stats produces: (for the standby)
2011-08-01 17:53:04.875951 7f701f89a710 log [INF] : mds.astcmalloc heap stats:------------------------------------------------
2011-08-01 17:53:04.875961 7f701f89a710 log [INF] : MALLOC:     31948800 (   30.5 MB) Heap size
2011-08-01 17:53:04.875976 7f701f89a710 log [INF] : MALLOC:     30205664 (   28.8 MB) Bytes in use by application
2011-08-01 17:53:04.875981 7f701f89a710 log [INF] : MALLOC:      1286144 (    1.2 MB) Bytes free in page heap
2011-08-01 17:53:04.875986 7f701f89a710 log [INF] : MALLOC:            0 (    0.0 MB) Bytes unmapped in page heap
2011-08-01 17:53:04.875991 7f701f89a710 log [INF] : MALLOC:       180400 (    0.2 MB) Bytes free in central cache
2011-08-01 17:53:04.875995 7f701f89a710 log [INF] : MALLOC:        96256 (    0.1 MB) Bytes free in transfer cache
2011-08-01 17:53:04.876000 7f701f89a710 log [INF] : MALLOC:       180336 (    0.2 MB) Bytes free in thread caches
2011-08-01 17:53:04.876005 7f701f89a710 log [INF] : MALLOC:         1184              Spans in use
2011-08-01 17:53:04.876010 7f701f89a710 log [INF] : MALLOC:            8              Thread heaps in use
2011-08-01 17:53:04.876015 7f701f89a710 log [INF] : MALLOC:      5242880 (    5.0 MB) Metadata allocated
2011-08-01 17:53:04.876020 7f701f89a710 log [INF] : ------------------------------------------------

and for the active:
2011-08-01 17:53:04.873806 7fada5502710 log [INF] : mds.atcmalloc heap stats:------------------------------------------------
2011-08-01 17:53:04.873813 7fada5502710 log [INF] : MALLOC:     44310528 (   42.3 MB) Heap size
2011-08-01 17:53:04.873823 7fada5502710 log [INF] : MALLOC:     31865160 (   30.4 MB) Bytes in use by application
2011-08-01 17:53:04.873832 7fada5502710 log [INF] : MALLOC:      7565312 (    7.2 MB) Bytes free in page heap
2011-08-01 17:53:04.873841 7fada5502710 log [INF] : MALLOC:            0 (    0.0 MB) Bytes unmapped in page heap
2011-08-01 17:53:04.873850 7fada5502710 log [INF] : MALLOC:      3177992 (    3.0 MB) Bytes free in central cache
2011-08-01 17:53:04.873859 7fada5502710 log [INF] : MALLOC:        66560 (    0.1 MB) Bytes free in transfer cache
2011-08-01 17:53:04.873867 7fada5502710 log [INF] : MALLOC:      1635504 (    1.6 MB) Bytes free in thread caches
2011-08-01 17:53:04.873878 7fada5502710 log [INF] : MALLOC:         2377              Spans in use
2011-08-01 17:53:04.873887 7fada5502710 log [INF] : MALLOC:           11              Thread heaps in use
2011-08-01 17:53:04.873896 7fada5502710 log [INF] : MALLOC:      5373952 (    5.1 MB) Metadata allocated
2011-08-01 17:53:04.873905 7fada5502710 log [INF] : ------------------------------------------------

Actions #3

Updated by Sage Weil over 12 years ago

Yeah, it's VIRT not RSS:

16410 sage      40   0 4617m  34m 5372 S    8  0.4   1:05.94 cmds               
16277 sage      40   0 4584m  31m 5424 S   10  0.4   1:06.78 cmds               
16529 sage      40   0 4599m  31m 5308 S   10  0.4   1:03.02 cmds               
16617 sage      40   0 4576m  31m 5316 S    8  0.4   1:05.55 cmds               

It is unbounded, though, growing linearly over time. I'm not sure how much of a problem that is on x86_64.

Actions #4

Updated by Greg Farnum over 12 years ago

I don't know much of anything about memory address management, but my assumption is that this is just a result of some weird oddity in the memory allocators and Linux and that if it runs out of virtual space that will be reclaimed. Somebody else might have a better idea, though, or if it's important we could launch a deeper investigation of the allocator behaviors and ask around! (Simple first step: compile without tcmalloc and see what changes.)

Actions #5

Updated by Sage Weil over 12 years ago

fixed by commit:37dc931d2b715070e9ea806620cea9bdc22e85b3

Actions #6

Updated by Sage Weil over 12 years ago

  • Status changed from New to Resolved
Actions #7

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
  • Target version deleted (v0.33)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF