Bug #48792
openHigh memory usage when dashboard module is enabled
0%
Description
We have clusters running ceph-mgr 14.2.8 and 14.2.11 via ceph-container, and noticed the ceph-mgr process memory usage keeps increasing, eventually eating up all memory.
We took 2 memory maps dumps for the mgr process over 23 hours, and found the string perf_schema_update
increased from 6.73m to 6.95m. The string service_map
increased as well, but compared with perf_schema_update
, its increase was fewer (from 3.42m to 3.52m).
# diff -y <(sort mgr_mem_dump_01061723/mem_strings | uniq -c | sort -nrk1 | head -20) <(sort mgr_mem_dump_01071613/mem_strings | uniq -c | sort -nrk1 | head -20) 6737595 perf_schema_update | 6951557 perf_schema_update 3420619 service_map | 3527473 service_map 2509278 mon.prod-mon04-object02 | 2587637 mon.prod-mon04-object02 2446596 mgr.prod-mon04-object02 | 2523289 mgr.prod-mon04-object02 1466359 type | 1511201 type 1464268 addr | 1509678 addr 1454171 nonce | 1498666 nonce 742643 addrvec | 765483 addrvec 715771 0{D! | 737568 0{D! 715136 (5W~XJd | 736906 (5W~XJd 712596 addrs | 734296 addrs 711791 stamp | 733425 stamp 711742 name | 733353 name 711440 message | 733094 message 711380 priority | 733034 priority 711376 clog | 733030 clog 711372 rank | 733026 rank 711368 channel | 733022 channel 666218 [INF] | 686822 [INF] 662728 audit | 683316 audit
Thus, we believe it is the dashboard module that causes potential memory leak. However, if we disabled the dashboard module via ceph mgr module disable dashboard
without restarting the mgr daemon, the memory could immediately release back to the OS, and the memory usage of mgr process would stop increasing. Thus, it looks like dashboard module is holding the memory somehow and keeps asking for memory.
Further reading the code, we found that it is updating the dashboard module, which is a Python binding.
Files
Updated by Ernesto Puerta about 3 years ago
- Project changed from mgr to Dashboard
- Category changed from 132 to General
Updated by Paul Kusters 12 months ago
This is still the case on quincy 17.2.6 and 17.2.5 as well.
[ceph: root@ceph-bru4-prod-mon-01 /]# ceph orch ps --daemon_type mgr NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID mgr.ceph-ant2-prod-mon-01.ukvjmb ceph-ant2-prod-mon-01 *:8443,9283 running (7w) 9m ago 7w 505M - 17.2.5 768e01abdf0b 605c2a53b228 mgr.ceph-ant2-prod-mon-02.cytlix ceph-ant2-prod-mon-02 *:8443,9283 running (7w) 9m ago 7w 437M - 17.2.5 768e01abdf0b 8025f1c8c4a0 mgr.ceph-bru1-prod-mon-01.mpdphh ceph-bru1-prod-mon-01 *:8443,9283 running (7w) 9m ago 7w 9864M - 17.2.5 768e01abdf0b 7cf8ea5799c7 mgr.ceph-bru1-prod-mon-02.lykpuf ceph-bru1-prod-mon-02 *:8443,9283 running (7w) 9m ago 7w 436M - 17.2.5 768e01abdf0b f27dddd822bc mgr.ceph-bru4-prod-mon-01.dumgmp ceph-bru4-prod-mon-01 *:9283 running (8w) 73s ago 8w 511M - 17.2.5 768e01abdf0b 2933882312cd [ceph: root@ceph-bru4-prod-mon-01 /]# ceph mgr module disable dashboard [ceph: root@ceph-bru4-prod-mon-01 /]# ceph mgr module enable dashboard [ceph: root@ceph-bru4-prod-mon-01 /]# ceph orch ps --daemon_type mgr NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID mgr.ceph-ant2-prod-mon-01.ukvjmb ceph-ant2-prod-mon-01 *:8443,9283 running (7w) 1s ago 7w 478M - 17.2.5 768e01abdf0b 605c2a53b228 mgr.ceph-ant2-prod-mon-02.cytlix ceph-ant2-prod-mon-02 *:8443,9283 running (7w) 0s ago 7w 408M - 17.2.5 768e01abdf0b 8025f1c8c4a0 mgr.ceph-bru1-prod-mon-01.mpdphh ceph-bru1-prod-mon-01 *:8443,9283 running (7w) 1s ago 7w 519M - 17.2.5 768e01abdf0b 7cf8ea5799c7 mgr.ceph-bru1-prod-mon-02.lykpuf ceph-bru1-prod-mon-02 *:8443,9283 running (7w) 1s ago 7w 408M - 17.2.5 768e01abdf0b f27dddd822bc mgr.ceph-bru4-prod-mon-01.dumgmp ceph-bru4-prod-mon-01 *:9283 running (8w) 0s ago 8w 481M - 17.2.5 768e01abdf0b 2933882312cd
When disabling and enabling the dashboard module the memory usage drops from 9864M to 519M.