Bug #47263
closedmemory leak, overusage OSD process since 14.2.10
0%
Description
Since the upgrade to 14.2.10 I have multiple OSDs in our big cluster that are overconsuming memory and are being OOM killed. They have a target of 10G but are doing 30+. I have to restart the OSD process multiple days a day to avoid OOMs.
Never had this issue before.
I don't know how to give you more info, please suggest something and I'll provide.
OS Ubuntu 18.04 LTS
Updated by Anonymous over 3 years ago
Logs only show compaction stats
** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- L0 1/0 11.96 MB 0.2 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 52.8 0.23 0.00 1 0.227 0 0 L1 3/0 195.08 MB 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 L2 39/0 2.25 GB 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 L3 294/0 17.30 GB 0.7 0.5 0.1 0.4 0.4 -0.0 0.0 7.5 22.2 19.4 22.73 2.19 1 22.730 828K 198K Sum 337/0 19.75 GB 0.0 0.5 0.1 0.4 0.4 0.0 0.0 37.9 22.0 19.7 22.96 2.19 2 11.478 828K 198K Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 ** Compaction Stats [default] ** Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Low 0/0 0.00 KB 0.0 0.5 0.1 0.4 0.4 -0.0 0.0 0.0 22.2 19.4 22.73 2.19 1 22.730 828K 198K User 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 52.8 0.23 0.00 1 0.227 0 0 Uptime(secs): 12001.0 total, 0.0 interval Flush(GB): cumulative 0.012, interval 0.000 AddFile(GB): cumulative 0.000, interval 0.000 AddFile(Total Files): cumulative 0, interval 0 AddFile(L0 Files): cumulative 0, interval 0 AddFile(Keys): cumulative 0, interval 0 Cumulative compaction: 0.44 GB write, 0.04 MB/s write, 0.49 GB read, 0.04 MB/s read, 23.0 seconds Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count ** File Read Latency Histogram By Level [default] **
Updated by Anonymous over 3 years ago
mepools for one using 20GB rising atm
{ "mempool": { "by_pool": { "bloom_filter": { "items": 0, "bytes": 0 }, "bluestore_alloc": { "items": 4790518, "bytes": 38324144 }, "bluestore_cache_data": { "items": 12141, "bytes": 2306498560 }, "bluestore_cache_onode": { "items": 2716047, "bytes": 1781726832 }, "bluestore_cache_other": { "items": 188405294, "bytes": 4851249797 }, "bluestore_fsck": { "items": 0, "bytes": 0 }, "bluestore_txc": { "items": 43, "bytes": 31304 }, "bluestore_writing_deferred": { "items": 36, "bytes": 3094274 }, "bluestore_writing": { "items": 0, "bytes": 0 }, "bluefs": { "items": 11527, "bytes": 282008 }, "buffer_anon": { "items": 652757, "bytes": 176387816 }, "buffer_meta": { "items": 641711, "bytes": 56470568 }, "osd": { "items": 138, "bytes": 1795104 }, "osd_mapbl": { "items": 52, "bytes": 9568008 }, "osd_pglog": { "items": 490693, "bytes": 229369757 }, "osdmap": { "items": 314913, "bytes": 12364816 }, "osdmap_mapping": { "items": 0, "bytes": 0 }, "pgmap": { "items": 0, "bytes": 0 }, "mds_co": { "items": 0, "bytes": 0 }, "unittest_1": { "items": 0, "bytes": 0 }, "unittest_2": { "items": 0, "bytes": 0 } }, "total": { "items": 198035870, "bytes": 9467162988 } } }
Updated by Anonymous over 3 years ago
MALLOC: 17863967056 (17036.4 MiB) Bytes in use by application
MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist
MALLOC: + 208440136 ( 198.8 MiB) Bytes in central cache freelist
MALLOC: + 13022272 ( 12.4 MiB) Bytes in transfer cache freelist
MALLOC: + 52510504 ( 50.1 MiB) Bytes in thread cache freelists
MALLOC: + 65876160 ( 62.8 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 18203816128 (17360.5 MiB) Actual memory used (physical + swap)
MALLOC: + 5685248 ( 5.4 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 18209501376 (17365.9 MiB) Virtual address space used
MALLOC:
MALLOC: 893169 Spans in use
MALLOC: 37 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------