Project

General

Profile

Actions

Bug #48792

open

High memory usage when dashboard module is enabled

Added by Yue Zhu over 3 years ago. Updated 12 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
General
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have clusters running ceph-mgr 14.2.8 and 14.2.11 via ceph-container, and noticed the ceph-mgr process memory usage keeps increasing, eventually eating up all memory.

We took 2 memory maps dumps for the mgr process over 23 hours, and found the string perf_schema_update increased from 6.73m to 6.95m. The string service_map increased as well, but compared with perf_schema_update, its increase was fewer (from 3.42m to 3.52m).

# diff -y <(sort mgr_mem_dump_01061723/mem_strings | uniq -c | sort -nrk1 | head -20) <(sort mgr_mem_dump_01071613/mem_strings | uniq -c | sort -nrk1 | head -20)
6737595 perf_schema_update                      |    6951557 perf_schema_update
3420619 service_map                          |    3527473 service_map
2509278 mon.prod-mon04-object02                      |    2587637 mon.prod-mon04-object02
2446596 mgr.prod-mon04-object02                      |    2523289 mgr.prod-mon04-object02
1466359 type                              |    1511201 type
1464268 addr                              |    1509678 addr
1454171 nonce                              |    1498666 nonce
 742643 addrvec                              |     765483 addrvec
 715771 0{D!                              |     737568 0{D!
 715136 (5W~XJd                           |     736906 (5W~XJd
 712596 addrs                              |     734296 addrs
 711791 stamp                              |     733425 stamp
 711742 name                              |     733353 name
 711440 message                              |     733094 message
 711380 priority                          |     733034 priority
 711376 clog                              |     733030 clog
 711372 rank                              |     733026 rank
 711368 channel                              |     733022 channel
 666218 [INF]                              |     686822 [INF]
 662728 audit                              |     683316 audit

Thus, we believe it is the dashboard module that causes potential memory leak. However, if we disabled the dashboard module via ceph mgr module disable dashboard without restarting the mgr daemon, the memory could immediately release back to the OS, and the memory usage of mgr process would stop increasing. Thus, it looks like dashboard module is holding the memory somehow and keeps asking for memory.

Further reading the code, we found that it is updating the dashboard module, which is a Python binding.


Files

WX20210107-170701@2x.png (374 KB) WX20210107-170701@2x.png Yue Zhu, 01/07/2021 10:07 PM
Actions #1

Updated by Yue Zhu over 3 years ago

Actions #2

Updated by Neha Ojha over 3 years ago

  • Category changed from ceph-mgr to 132
Actions #3

Updated by Ernesto Puerta about 3 years ago

  • Project changed from mgr to Dashboard
  • Category changed from 132 to General
Actions #4

Updated by Paul Kusters 12 months ago

This is still the case on quincy 17.2.6 and 17.2.5 as well.

[ceph: root@ceph-bru4-prod-mon-01 /]# ceph orch ps --daemon_type mgr
NAME                              HOST                   PORTS        STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
mgr.ceph-ant2-prod-mon-01.ukvjmb  ceph-ant2-prod-mon-01  *:8443,9283  running (7w)     9m ago   7w     505M        -  17.2.5   768e01abdf0b  605c2a53b228  
mgr.ceph-ant2-prod-mon-02.cytlix  ceph-ant2-prod-mon-02  *:8443,9283  running (7w)     9m ago   7w     437M        -  17.2.5   768e01abdf0b  8025f1c8c4a0  
mgr.ceph-bru1-prod-mon-01.mpdphh  ceph-bru1-prod-mon-01  *:8443,9283  running (7w)     9m ago   7w    9864M        -  17.2.5   768e01abdf0b  7cf8ea5799c7  
mgr.ceph-bru1-prod-mon-02.lykpuf  ceph-bru1-prod-mon-02  *:8443,9283  running (7w)     9m ago   7w     436M        -  17.2.5   768e01abdf0b  f27dddd822bc  
mgr.ceph-bru4-prod-mon-01.dumgmp  ceph-bru4-prod-mon-01  *:9283       running (8w)    73s ago   8w     511M        -  17.2.5   768e01abdf0b  2933882312cd  
[ceph: root@ceph-bru4-prod-mon-01 /]# ceph mgr module disable dashboard
[ceph: root@ceph-bru4-prod-mon-01 /]# ceph mgr module enable dashboard
[ceph: root@ceph-bru4-prod-mon-01 /]# ceph orch ps --daemon_type mgr
NAME                              HOST                   PORTS        STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
mgr.ceph-ant2-prod-mon-01.ukvjmb  ceph-ant2-prod-mon-01  *:8443,9283  running (7w)     1s ago   7w     478M        -  17.2.5   768e01abdf0b  605c2a53b228  
mgr.ceph-ant2-prod-mon-02.cytlix  ceph-ant2-prod-mon-02  *:8443,9283  running (7w)     0s ago   7w     408M        -  17.2.5   768e01abdf0b  8025f1c8c4a0  
mgr.ceph-bru1-prod-mon-01.mpdphh  ceph-bru1-prod-mon-01  *:8443,9283  running (7w)     1s ago   7w     519M        -  17.2.5   768e01abdf0b  7cf8ea5799c7  
mgr.ceph-bru1-prod-mon-02.lykpuf  ceph-bru1-prod-mon-02  *:8443,9283  running (7w)     1s ago   7w     408M        -  17.2.5   768e01abdf0b  f27dddd822bc  
mgr.ceph-bru4-prod-mon-01.dumgmp  ceph-bru4-prod-mon-01  *:9283       running (8w)     0s ago   8w     481M        -  17.2.5   768e01abdf0b  2933882312cd 

When disabling and enabling the dashboard module the memory usage drops from 9864M to 519M.

Actions

Also available in: Atom PDF