Bug #1349
closedmds: standby-replay leaks memory
0%
Description
When i do 'CEPH_NUM_MDS=4 ./vstart.sh -d -n -x -s' and cfuse and 'fsstress -d foo -l 1 -n 1000 -p 10 -v' i see the standby mds's eat memory like crazy (i currently have 4 21GB cmds procs).
I suspect you'll see the same thing with 1 mds and a simpler workload, but I haven't checked.
Updated by Greg Farnum over 12 years ago
I don't think this is actually a memory leak. On kai I ran a 1-MDS system with standby and did snaptest-2. The standby's virtual memory rapidly climbed but its resident stayed well below the active's even before I ran out of system memory, and trying it with heap profiling on indicated the heap was pretty small. Unless we've somehow gotten ourselves into a situation where it's got infinite stack recursion, which I think we looked at before and dismissed as a possibility.
I imagine it's just a result of decoding on-disk state frequently.
Unless you're seeing it with real RSS troubles?
Updated by Greg Farnum over 12 years ago
For instance: (the one on top is the standby; the workload finished)
27490 gregf 20 0 2927m 51m 3168 S 0 1.3 0:06.60 cmds 27477 gregf 20 0 140m 51m 3912 S 0 1.3 1:24.16 cmds
./ceph mds tell \* heap stats produces: (for the standby)
2011-08-01 17:53:04.875951 7f701f89a710 log [INF] : mds.astcmalloc heap stats:------------------------------------------------ 2011-08-01 17:53:04.875961 7f701f89a710 log [INF] : MALLOC: 31948800 ( 30.5 MB) Heap size 2011-08-01 17:53:04.875976 7f701f89a710 log [INF] : MALLOC: 30205664 ( 28.8 MB) Bytes in use by application 2011-08-01 17:53:04.875981 7f701f89a710 log [INF] : MALLOC: 1286144 ( 1.2 MB) Bytes free in page heap 2011-08-01 17:53:04.875986 7f701f89a710 log [INF] : MALLOC: 0 ( 0.0 MB) Bytes unmapped in page heap 2011-08-01 17:53:04.875991 7f701f89a710 log [INF] : MALLOC: 180400 ( 0.2 MB) Bytes free in central cache 2011-08-01 17:53:04.875995 7f701f89a710 log [INF] : MALLOC: 96256 ( 0.1 MB) Bytes free in transfer cache 2011-08-01 17:53:04.876000 7f701f89a710 log [INF] : MALLOC: 180336 ( 0.2 MB) Bytes free in thread caches 2011-08-01 17:53:04.876005 7f701f89a710 log [INF] : MALLOC: 1184 Spans in use 2011-08-01 17:53:04.876010 7f701f89a710 log [INF] : MALLOC: 8 Thread heaps in use 2011-08-01 17:53:04.876015 7f701f89a710 log [INF] : MALLOC: 5242880 ( 5.0 MB) Metadata allocated 2011-08-01 17:53:04.876020 7f701f89a710 log [INF] : ------------------------------------------------
and for the active:
2011-08-01 17:53:04.873806 7fada5502710 log [INF] : mds.atcmalloc heap stats:------------------------------------------------ 2011-08-01 17:53:04.873813 7fada5502710 log [INF] : MALLOC: 44310528 ( 42.3 MB) Heap size 2011-08-01 17:53:04.873823 7fada5502710 log [INF] : MALLOC: 31865160 ( 30.4 MB) Bytes in use by application 2011-08-01 17:53:04.873832 7fada5502710 log [INF] : MALLOC: 7565312 ( 7.2 MB) Bytes free in page heap 2011-08-01 17:53:04.873841 7fada5502710 log [INF] : MALLOC: 0 ( 0.0 MB) Bytes unmapped in page heap 2011-08-01 17:53:04.873850 7fada5502710 log [INF] : MALLOC: 3177992 ( 3.0 MB) Bytes free in central cache 2011-08-01 17:53:04.873859 7fada5502710 log [INF] : MALLOC: 66560 ( 0.1 MB) Bytes free in transfer cache 2011-08-01 17:53:04.873867 7fada5502710 log [INF] : MALLOC: 1635504 ( 1.6 MB) Bytes free in thread caches 2011-08-01 17:53:04.873878 7fada5502710 log [INF] : MALLOC: 2377 Spans in use 2011-08-01 17:53:04.873887 7fada5502710 log [INF] : MALLOC: 11 Thread heaps in use 2011-08-01 17:53:04.873896 7fada5502710 log [INF] : MALLOC: 5373952 ( 5.1 MB) Metadata allocated 2011-08-01 17:53:04.873905 7fada5502710 log [INF] : ------------------------------------------------
Updated by Sage Weil over 12 years ago
Yeah, it's VIRT not RSS:
16410 sage 40 0 4617m 34m 5372 S 8 0.4 1:05.94 cmds 16277 sage 40 0 4584m 31m 5424 S 10 0.4 1:06.78 cmds 16529 sage 40 0 4599m 31m 5308 S 10 0.4 1:03.02 cmds 16617 sage 40 0 4576m 31m 5316 S 8 0.4 1:05.55 cmds
It is unbounded, though, growing linearly over time. I'm not sure how much of a problem that is on x86_64.
Updated by Greg Farnum over 12 years ago
I don't know much of anything about memory address management, but my assumption is that this is just a result of some weird oddity in the memory allocators and Linux and that if it runs out of virtual space that will be reclaimed. Somebody else might have a better idea, though, or if it's important we could launch a deeper investigation of the allocator behaviors and ask around! (Simple first step: compile without tcmalloc and see what changes.)
Updated by Sage Weil over 12 years ago
fixed by commit:37dc931d2b715070e9ea806620cea9bdc22e85b3
Updated by John Spray over 7 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1) - Target version deleted (
v0.33)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.