Project

General

Profile

Actions

Bug #47263

closed

memory leak, overusage OSD process since 14.2.10

Added by Anonymous over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Since the upgrade to 14.2.10 I have multiple OSDs in our big cluster that are overconsuming memory and are being OOM killed. They have a target of 10G but are doing 30+. I have to restart the OSD process multiple days a day to avoid OOMs.
Never had this issue before.

I don't know how to give you more info, please suggest something and I'll provide.

OS Ubuntu 18.04 LTS

Actions #1

Updated by Anonymous over 3 years ago

Logs only show compaction stats

** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0   11.96 MB   0.2      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     52.8      0.23              0.00         1    0.227       0      0
  L1      3/0   195.08 MB   0.8      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L2     39/0    2.25 GB   1.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L3    294/0   17.30 GB   0.7      0.5     0.1      0.4       0.4     -0.0       0.0   7.5     22.2     19.4     22.73              2.19         1   22.730    828K   198K
 Sum    337/0   19.75 GB   0.0      0.5     0.1      0.4       0.4      0.0       0.0  37.9     22.0     19.7     22.96              2.19         2   11.478    828K   198K
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0

** Compaction Stats [default] **
Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Low      0/0    0.00 KB   0.0      0.5     0.1      0.4       0.4     -0.0       0.0   0.0     22.2     19.4     22.73              2.19         1   22.730    828K   198K
User      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0     52.8      0.23              0.00         1    0.227       0      0
Uptime(secs): 12001.0 total, 0.0 interval
Flush(GB): cumulative 0.012, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.44 GB write, 0.04 MB/s write, 0.49 GB read, 0.04 MB/s read, 23.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count

** File Read Latency Histogram By Level [default] **

Actions #2

Updated by Anonymous over 3 years ago

mepools for one using 20GB rising atm

{
    "mempool": {
        "by_pool": {
            "bloom_filter": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_alloc": {
                "items": 4790518,
                "bytes": 38324144
            },
            "bluestore_cache_data": {
                "items": 12141,
                "bytes": 2306498560
            },
            "bluestore_cache_onode": {
                "items": 2716047,
                "bytes": 1781726832
            },
            "bluestore_cache_other": {
                "items": 188405294,
                "bytes": 4851249797
            },
            "bluestore_fsck": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_txc": {
                "items": 43,
                "bytes": 31304
            },
            "bluestore_writing_deferred": {
                "items": 36,
                "bytes": 3094274
            },
            "bluestore_writing": {
                "items": 0,
                "bytes": 0
            },
            "bluefs": {
                "items": 11527,
                "bytes": 282008
            },
            "buffer_anon": {
                "items": 652757,
                "bytes": 176387816
            },
            "buffer_meta": {
                "items": 641711,
                "bytes": 56470568
            },
            "osd": {
                "items": 138,
                "bytes": 1795104
            },
            "osd_mapbl": {
                "items": 52,
                "bytes": 9568008
            },
            "osd_pglog": {
                "items": 490693,
                "bytes": 229369757
            },
            "osdmap": {
                "items": 314913,
                "bytes": 12364816
            },
            "osdmap_mapping": {
                "items": 0,
                "bytes": 0
            },
            "pgmap": {
                "items": 0,
                "bytes": 0
            },
            "mds_co": {
                "items": 0,
                "bytes": 0
            },
            "unittest_1": {
                "items": 0,
                "bytes": 0
            },
            "unittest_2": {
                "items": 0,
                "bytes": 0
            }
        },
        "total": {
            "items": 198035870,
            "bytes": 9467162988
        }
    }
}
Actions #3

Updated by Anonymous over 3 years ago

MALLOC: 17863967056 (17036.4 MiB) Bytes in use by application
MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist
MALLOC: + 208440136 ( 198.8 MiB) Bytes in central cache freelist
MALLOC: + 13022272 ( 12.4 MiB) Bytes in transfer cache freelist
MALLOC: + 52510504 ( 50.1 MiB) Bytes in thread cache freelists
MALLOC: + 65876160 ( 62.8 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 18203816128 (17360.5 MiB) Actual memory used (physical + swap)
MALLOC: + 5685248 ( 5.4 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 18209501376 (17365.9 MiB) Virtual address space used
MALLOC:
MALLOC: 893169 Spans in use
MALLOC: 37 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------

Actions #4

Updated by Anonymous over 3 years ago

Please CLOSE. Solved, user error

Actions #5

Updated by Igor Fedotov over 3 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF