Project

General

Profile

Bug #45695

librados: significant memory consumption

Added by David Disseldorp 8 months ago. Updated 8 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

I did some valgrind massif heap profiling with the following simple librados (octopus 15.2.1) program:

int main(void)
{
        int ret;
        rados_t cluster_parent = NULL;
        char buf[4096];
        char *this = NULL;

        ret = rados_create(&cluster_parent, NULL);
        if (ret < 0)
                exit(1);

        ret = rados_conf_read_file(cluster_parent, NULL);
        if (ret < 0)
                exit(1);

        ret = rados_conf_parse_env(cluster_parent, NULL);
        if (ret < 0)
                exit(1);

        ret = rados_connect(cluster_parent);
        if (ret < 0)
                exit(1);

        ret = rados_pool_list(cluster_parent, buf, sizeof(buf));
        if (ret >= sizeof(buf))
                exit(1);

        for (this = buf; this < buf + ret && *this != '\0';
                                                this += strlen(this) + 1) {
                printf("pool: %s\n", this);
        }

        sleep(4);

        rados_shutdown(cluster_parent);
}

Heap memory consumption for this program is significant, peaking at ~13MB:


    MB
13.19^                                                                     #  
     |                                                        @:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                                       @:@::@::::::::@::@:::@:::@::@:#: 
     |                         :   ::@::::@::@:@::@::::::::@::@:::@:::@::@:#: 
     |                       @::@::::@::: @::@:@::@::::::::@::@:::@:::@::@:#::
   0 +----------------------------------------------------------------------->Mi
     0                                                                   42.68

At its peak, most of the heap consumption corresponds to the logging backend, but excluding that there's still a good 3MB consumed elsewhere:


--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 84     43,255,122       13,821,824       13,626,932       194,892            0
 85     43,271,478       13,832,648       13,636,951       195,697            0
98.59% (13,636,951B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->78.65% (10,880,000B) 0x5ACBF28: ceph::logging::Log::Log(ceph::logging::SubsystemMap const*) (in /usr/lib64/ceph/libceph-common.so.2)
| ->78.65% (10,880,000B) 0x589B48D: ceph::common::CephContext::CephContext(unsigned int, code_environment_t, int) (in /usr/lib64/ceph/libceph-common.so.2)
|   ->78.65% (10,880,000B) 0x58C1413: common_preinit(CephInitParameters const&, code_environment_t, int) (in /usr/lib64/ceph/libceph-common.so.2)
|     ->78.65% (10,880,000B) 0x4E89529: ??? (in /usr/lib64/librados.so.2.0.0)
|       ->78.65% (10,880,000B) 0x4E89668: rados_create (in /usr/lib64/librados.so.2.0.0)
|         ->78.65% (10,880,000B) 0x400932: main (in /home/david/zram/work/ceph-heap-profile/a.out)
|           
->08.43% (1,165,911B) in 412 places, all below massif's threshold (1.00%)
| 
->06.34% (876,960B) 0x58E217C: void std::vector<Option, std::allocator<Option> >::_M_realloc_insert<Option>(__gnu_cxx::__normal_iterator<Option*, std::vector<Option, std::allocator<Option> > >, Option&&) (in /usr/lib64/ceph/libceph-common.so.2)
| ->06.34% (876,960B) 0x57B7BBD: ??? (in /usr/lib64/ceph/libceph-common.so.2)
|   ->06.34% (876,960B) 0x57BEFD5: ??? (in /usr/lib64/ceph/libceph-common.so.2)
|     ->06.34% (876,960B) 0x400FAB9: call_init.part.0 (in /lib64/ld-2.26.so)
|       ->06.34% (876,960B) 0x400FBC5: _dl_init (in /lib64/ld-2.26.so)
|         ->06.34% (876,960B) 0x4000ED9: ??? (in /lib64/ld-2.26.so)
...

History

#1 Updated by David Disseldorp 8 months ago

I should have mentioned that my client ceph.conf is minimal, with only the mon host and keyring options set.

#2 Updated by David Disseldorp 8 months ago

I've tested with in-memory logging disabled via the client ceph.conf:

[global]
        mon host =  [v2:X:40858,v1:X:40859]
        debug default = 0/0
        debug lockdep = 0/0
        debug context = 0/0
        debug crush = 0/0
        debug mds = 0/0
        debug mds balancer = 0/0
        debug mds locker = 0/0
        debug mds log = 0/0
        debug mds log expire = 0/0
        debug mds migrator = 0/0
        debug buffer = 0/0
        debug timer = 0/0
        debug filer = 0/0
        debug objecter = 0/0
        debug rados = 0/0
        debug rbd = 0/0
        debug journaler = 0/0
        debug objectcacher = 0/0
        debug client = 0/0
        debug osd = 0/0
        debug optracker = 0/0
        debug objclass = 0/0
        debug filestore = 0/0
        debug journal = 0/0
        debug ms = 0/0
        debug mon = 0/0
        debug monc = 0/0
        debug paxos = 0/0
        debug tp = 0/0
        debug auth = 0/0
        debug finisher = 0/0
        debug heartbeatmap = 0/0
        debug perfcounter = 0/0
        debug rgw = 0/0
        debug javaclient = 0/0
        debug asok = 0/0
        debug throttle = 0/0

For some reason the logging backend still consumes ~10M of heap memory:


    MB
13.06^                                                                     #  
     |                                                         @:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                                       :::@:::::::::@::::@:::@::@::::#: 
     |                         @  :@::::::::::::@:::::::::@::::@:::@::@::::#: 
     |                       @@@::@@:::: :::::::@:::::::::@::::@:::@::@::::#::
   0 +----------------------------------------------------------------------->Mi
     0                                                                   43.55

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 69     44,178,164       13,565,720       13,378,007       187,713            0
 70     44,337,179       13,690,856       13,494,072       196,784            0
98.56% (13,494,072B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->79.47% (10,880,000B) 0x5ACBF28: ceph::logging::Log::Log(ceph::logging::SubsystemMap const*) (in /usr/lib64/ceph/libceph-common.so.2)
| ->79.47% (10,880,000B) 0x589B48D: ceph::common::CephContext::CephContext(unsigned int, code_environment_t, int) (in /usr/lib64/ceph/libceph-common.so.2)
|   ->79.47% (10,880,000B) 0x58C1413: common_preinit(CephInitParameters const&, code_environment_t, int) (in /usr/lib64/ceph/libceph-common.so.2)
|     ->79.47% (10,880,000B) 0x4E89529: ??? (in /usr/lib64/librados.so.2.0.0)
|       ->79.47% (10,880,000B) 0x4E89668: rados_create (in /usr/lib64/librados.so.2.0.0)
|         ->79.47% (10,880,000B) 0x400932: main (rados-connect.c:13)
|           
->08.74% (1,197,112B) in 408 places, all below massif's threshold (1.00%)
| 
->06.41% (876,960B) 0x58E217C: void std::vector<Option, std::allocator<Option> >::_M_realloc_insert<Option>(__gnu_cxx::__normal_iterator<Option*, std::vector<Option, std::all
ocator<Option> > >, Option&&) (in /usr/lib64/ceph/libceph-common.so.2)
| ->06.41% (876,960B) 0x57B7BBD: ??? (in /usr/lib64/ceph/libceph-common.so.2)
|   ->06.41% (876,960B) 0x57BEFD5: ??? (in /usr/lib64/ceph/libceph-common.so.2)
|     ->06.41% (876,960B) 0x400FAB9: call_init.part.0 (in /lib64/ld-2.26.so)
|       ->06.41% (876,960B) 0x400FBC5: _dl_init (in /lib64/ld-2.26.so)
|         ->06.41% (876,960B) 0x4000ED9: ??? (in /lib64/ld-2.26.so)
|           
->02.63% (360,000B) 0x5A92D1D: std::vector<EventCenter::FileEvent, std::allocator<EventCenter::FileEvent> >::_M_default_append(unsigned long) (in /usr/lib64/ceph/libceph-comm
on.so.2)
| ->02.63% (360,000B) 0x5A90A9E: EventCenter::init(int, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib64/ceph/libceph-common.so.2)
|   ->02.63% (360,000B) 0x5A96EAC: NetworkStack::NetworkStack(ceph::common::CephContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib64/ceph/libceph-common.so.2)
|     ->02.63% (360,000B) 0x5A94498: PosixNetworkStack::PosixNetworkStack(ceph::common::CephContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib64/ceph/libceph-common.so.2)
|       ->02.63% (360,000B) 0x5A95BC0: NetworkStack::create(ceph::common::CephContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /usr/lib64/ceph/libceph-common.so.2)
|         ->02.63% (360,000B) 0x5A42D1F: AsyncMessenger::AsyncMessenger(ceph::common::CephContext*, entity_name_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long) (in /usr/lib64/ceph/libceph-common.so.2)
|           ->02.63% (360,000B) 0x5A2B8B6: Messenger::create(ceph::common::CephContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, entity_name_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long, unsigned long) (in /usr/lib64/ceph/libceph-common.so.2)
|             ->02.63% (360,000B) 0x5A2BDA0: Messenger::create_client_messenger(ceph::common::CephContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (in /usr/lib64/ceph/libceph-common.so.2)
|               ->02.63% (360,000B) 0x5AF3925: MonClient::get_monmap_and_config() (in /usr/lib64/ceph/libceph-common.so.2)
|                 ->02.63% (360,000B) 0x4ED37DB: ??? (in /usr/lib64/librados.so.2.0.0)
|                   ->02.63% (360,000B) 0x4E68BFE: rados_connect (in /usr/lib64/librados.so.2.0.0)
|                     ->02.63% (360,000B) 0x400999: main (rados-connect.c:25)

#3 Updated by Brad Hubbard 8 months ago

  • Project changed from Ceph to RADOS
  • Category deleted (librados)

#4 Updated by David Disseldorp 8 months ago

David Disseldorp wrote:

I've tested with in-memory logging disabled via the client ceph.conf:

[...]

For some reason the logging backend still consumes ~10M of heap memory:
[...]

It looks like Patrick did some work a while ago on reducing the heap consumption of the logging backend via a747aeac13daf3dba43343120659e802cb569f6b .

Also available in: Atom PDF