Project

General

Profile

Actions

Bug #48381

closed

Memory leak ceph-mon in ConfigMonitor::load_config

Added by Aleksey Ivanov over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
ceph-mon
Backport:
octopus, nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello everyone!

I have a memory leak on 2 clusters: QAS, PROD after update from Ceph version 13.2.5 to 15.2.5 in ceph-mon process.

One hour uptime statistics:

# ceph tell mon.service01-qas heap dump
mon.service01-qas dumping heap profile now.
------------------------------------------------
MALLOC:      222469760 (  212.2 MiB) Bytes in use by application
MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +      6777856 (    6.5 MiB) Bytes in central cache freelist
MALLOC: +      3080704 (    2.9 MiB) Bytes in transfer cache freelist
MALLOC: +      7320448 (    7.0 MiB) Bytes in thread cache freelists
MALLOC: +      3014656 (    2.9 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =    242663424 (  231.4 MiB) Actual memory used (physical + swap)
MALLOC: +      2154496 (    2.1 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =    244817920 (  233.5 MiB) Virtual address space used
MALLOC:
MALLOC:          10291              Spans in use
MALLOC:             24              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

This is result after 11 days of proccess running:

# ceph tell mon.service01-qas heap stats
mon.service01-qas tcmalloc heap stats:------------------------------------------------
MALLOC:     3582733448 ( 3416.8 MiB) Bytes in use by application
MALLOC: +        16384 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +     64338336 (   61.4 MiB) Bytes in central cache freelist
MALLOC: +      8880896 (    8.5 MiB) Bytes in transfer cache freelist
MALLOC: +     41391832 (   39.5 MiB) Bytes in thread cache freelists
MALLOC: +     28442624 (   27.1 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =   3725803520 ( 3553.2 MiB) Actual memory used (physical + swap)
MALLOC: +      7127040 (    6.8 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =   3732930560 ( 3560.0 MiB) Virtual address space used
MALLOC:
MALLOC:         437764              Spans in use
MALLOC:             25              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

# pprof --text /usr/bin/ceph-mon /var/log/ceph/mon.service01-qas.profile.1900.heap                                                  
Using local file /usr/bin/ceph-mon.                                                                                                                          
Using local file /var/log/ceph/mon.service01-qas.profile.1900.heap.                                                                                          
Total: 3150.8 MB                                                                                                                                             
  2702.7  85.8%  85.8%   3078.1  97.7% ConfigMonitor::load_config           
   375.5  11.9%  97.7%    375.5  11.9% std::string::_Rep::_S_create                                                                                          
    60.4   1.9%  99.6%     60.4   1.9% rocksdb::BlockFetcher::ReadBlockContents                                                                              
     8.4   0.3%  99.9%      8.4   0.3% ceph::logging::Log::submit_entry                                                                                      
     1.0   0.0%  99.9%      1.0   0.0% rocksdb::WritableFileWriter::Append                                                                                   
     0.6   0.0%  99.9%      0.6   0.0% ceph::buffer::v15_2_0::list::refill_append_space                                                                      
     0.5   0.0%  99.9%      0.5   0.0% std::_Rb_tree::_M_copy                                                                                                
     0.5   0.0% 100.0%      0.5   0.0% rocksdb_cache::BinnedLRUHandleTable::Resize   

Cluster state:

# ceph status
  cluster:
    id:     7a2137a1-a6c6-401c-ab31-37da3ba074b8
    health: HEALTH_OK

  services:
    mon: 4 daemons, quorum service01-qas,service02-qas,service03-qas,service04-qas (age 10d)
    mgr: service04-qas(active, since 5w), standbys: service02-qas, service03-qas, service01-qas
    mds: remote_files:1 {0=service04-qas=up:active} 3 up:standby
    osd: 4 osds: 4 up (since 4w), 4 in (since 4w); 32 remapped pgs

  task status:
    scrub status:
        mds.service04-qas: idle

  data:
    pools:   8 pools, 256 pgs
    objects: 195.79k objects, 35 GiB
    usage:   91 GiB used, 1.9 TiB / 2.0 TiB avail
    pgs:     224 active+clean
             32  active+clean+remapped

Configuration:

# ceph config show mon.service01-qas
NAME                       VALUE                                                       SOURCE    OVERRIDES  IGNORES
auth_client_required       cephx                                                       file                        
auth_cluster_required      cephx                                                       file                        
auth_service_required      cephx                                                       file                        
daemonize                  false                                                       override                    
keyring                    $mon_data/keyring                                           default                     
leveldb_block_size         65536                                                       default                     
leveldb_cache_size         536870912                                                   default                     
leveldb_compression        false                                                       default                     
leveldb_log                                                                            default                     
leveldb_write_buffer_size  33554432                                                    default                     
mon_host                   10.189.44.81,10.189.44.82,10.189.44.83,10.189.44.84         file                        
mon_initial_members        service01-qas, service02-qas, service03-qas, service04-qas  file                        
no_config_file             false                                                       override                    
public_network             10.189.44.0/24                                              file                        
rbd_default_features       61                                                          default                     
setgroup                   ceph                                                        cmdline                     
setuser                    ceph                                                        cmdline


Related issues 2 (0 open2 closed)

Copied to Ceph - Backport #49914: octopus: Memory leak ceph-mon in ConfigMonitor::load_configResolvedKefu ChaiActions
Copied to Ceph - Backport #49915: nautilus: Memory leak ceph-mon in ConfigMonitor::load_configResolvedKefu ChaiActions
Actions

Also available in: Atom PDF