Project

General

Profile

Bug #56424

bluestore_cache_other mempool entry leak

Added by alexandre derumier about 2 months ago. Updated 5 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy, pacific, octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

I have an octopus cluster (15.2.16),

(I was first installed in octopus, no upgrade from previous ceph version)

where all osd have their bluestore_cache_other slowly growing
over time, using all osd memory, and reduce other pool
(bluestore_cache_onode ,bluestore_cache_data,bluestore_cache_meta,...)
to almost zero. Their performance and latency impact occur

I have attached a grafana screenshot with graphs of pools over
the last 2months

Usage of this cluster is 100% rbd with qemu vm with octopus librbd.
scrubbing is planned only the night (and the leak seem to be also
during the day, so I don't think it's related).

I'm backuping this cluster through rbd snap|export-diff|import|trim to
another ceph cluster each night too.

Does anybody known how to debug this or to have more infos about the
content of the bluestore_cache_other pool ?

I'm currently using bitmap allocator

bluestore_allocator = bitmap
bluefs_allocator = bitmap

bluefs buffered io is true

bluefs_buffered_io = true

server have a lot a free memory. (no swap)

#free -m
               total       utilisé      libre     partagé tamp/cache  
disponible
Mem:          128845       49523        1145        1349       78176  
76849
Partition d'échange:          0           0           0

osds are using around 2GB memory

#ps -aux
ceph      621071 28.1  1.6 5547460 2141148 ?     Ssl  mai17 17605:21
/usr/bin/ceph-osd -f --cluster ceph --id 8 --setuser ceph --setgroup
ceph
ceph     1386069 27.7  1.6 5474260 2111700 ?     Ssl  mai04 22668:44
/usr/bin/ceph-osd -f --cluster ceph --id 7 --setuser ceph --setgroup
ceph
ceph     2220877 27.1  1.6 5630312 2192092 ?     Ssl  avril21 27339:40
/usr/bin/ceph-osd -f --cluster ceph --id 14 --setuser ceph --setgroup
ceph
ceph     2220886 26.4  1.6 5512988 2184240 ?     Ssl  avril21 26690:21
/usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup
ceph
ceph     2220887 30.5  1.6 5599240 2166672 ?     Ssl  avril21 30788:21
/usr/bin/ceph-osd -f --cluster ceph --id 18 --setuser ceph --setgroup
ceph
ceph     2220892 26.1  1.5 5463992 2107960 ?     Ssl  avril21 26341:42
/usr/bin/ceph-osd -f --cluster ceph --id 16 --setuser ceph --setgroup
ceph
ceph     2220976 26.4  1.6 5580952 2152004 ?     Ssl  avril21 26698:44
/usr/bin/ceph-osd -f --cluster ceph --id 15 --setuser ceph --setgroup
ceph
ceph     2220994 30.0  1.6 5604840 2149032 ?     Ssl  avril21 30271:50
/usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup
ceph
ceph     2221015 28.5  1.6 5613948 2169252 ?     Ssl  avril21 28783:38
/usr/bin/ceph-osd -f --cluster ceph --id 12 --setuser ceph --setgroup
ceph
ceph     2221080 28.0  1.5 5644560 2086976 ?     Ssl  avril21 28299:06
/usr/bin/ceph-osd -f --cluster ceph --id 6 --setuser ceph --setgroup
ceph
ceph     2221120 30.2  1.6 5605180 2181240 ?     Ssl  avril21 30509:06
/usr/bin/ceph-osd -f --cluster ceph --id 17 --setuser ceph --setgroup
ceph
ceph     2221156 28.6  1.5 5613664 2109236 ?     Ssl  avril21 28891:45
/usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup
ceph
ceph     2221189 28.8  1.6 5665824 2188980 ?     Ssl  avril21 29070:01
/usr/bin/ceph-osd -f --cluster ceph --id 19 --setuser ceph --setgroup
ceph
ceph     2221276 26.8  1.5 5555660 2091880 ?     Ssl  avril21 27093:56
/usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup
ceph
ceph     2221277 27.7  1.5 5609368 2074836 ?     Ssl  avril21 27987:11
/usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup
ceph
ceph     2221278 28.4  1.6 5596020 2147776 ?     Ssl  avril21 28714:16
/usr/bin/ceph-osd -f --cluster ceph --id 9 --setuser ceph --setgroup
ceph
ceph     2221564 27.0  1.5 5569916 2103536 ?     Ssl  avril21 27291:15
/usr/bin/ceph-osd -f --cluster ceph --id 13 --setuser ceph --setgroup
ceph
ceph     2221655 32.2  1.6 5680616 2146472 ?     Ssl  avril21 32443:03
/usr/bin/ceph-osd -f --cluster ceph --id 11 --setuser ceph --setgroup
ceph

here the stats of 1 osd: (other osd have behaviour)


#ceph config set osd.6 mempool_debug true

#ceph daemon osd.5 dump_mempools
{
    "mempool": {
        "by_pool": {
            "bloom_filter": {
                "items": 0,
                "bytes": 0,
                "by_type": {
                    "unsigned char": {
                        "items": 0,
                        "bytes": 0
                    }
                }
            },
            "bluestore_alloc": {
                "items": 13024860,
                "bytes": 104198880,
                "by_type": {
                    "range_seg_t": {
                        "items": 0,
                        "bytes": 0
                    }
                }
            },
            "bluestore_cache_data": {
                "items": 259,
                "bytes": 5077606
            },
            "bluestore_cache_onode": {
                "items": 95,
                "bytes": 58520,
                "by_type": {
                    "BlueStore::Onode": {
                        "items": 95,
                        "bytes": 58520
                    }
                }
            },
            "bluestore_cache_meta": {
                "items": 2972660,
                "bytes": 23797771,
                "by_type": {
                    "BlueStore::ExtentMap::Shard": {
                        "items": 797,
                        "bytes": 12752
                    },
                    "char": {
                        "items": 5818,
                        "bytes": 5818
                    },
                    "std::_Rb_tree_node<std::pair<int const,
boost::intrusive_ptr<BlueStore::Blob> > >": {
                        "items": 68,
                        "bytes": 3264
                    },
                    "std::_Rb_tree_node<std::pair<unsigned int const,
std::unique_ptr<BlueStore::Buffer,
std::default_delete<BlueStore::Buffer> > > >": {
                        "items": 122,
                        "bytes": 5856
                    }
                }
            },
            "bluestore_cache_other": {
                "items": 80377013,
                "bytes": 2794170136,
                "by_type": {
                    "bluestore_pextent_t": {
                        "items": 4734,
                        "bytes": 75744
                    },
                    "bluestore_shared_blob_t": {
                        "items": 1,
                        "bytes": 72
                    },
                    "std::_Rb_tree_node<std::pair<unsigned long const,
bluestore_extent_ref_map_t::record_t> >": {
                        "items": 4,
                        "bytes": 192
                    }
                }
            },
            "bluestore_Buffer": {
                "items": 122,
                "bytes": 11712,
                "by_type": {
                    "BlueStore::Buffer": {
                        "items": 122,
                        "bytes": 11712
                    }
                }
            },
            "bluestore_Extent": {
                "items": 965,
                "bytes": 46320,
                "by_type": {
                    "BlueStore::Extent": {
                        "items": 965,
                        "bytes": 46320
                    }
                }
            },
            "bluestore_Blob": {
                "items": 942,
                "bytes": 97968,
                "by_type": {
                    "BlueStore::Blob": {
                        "items": 942,
                        "bytes": 97968
                    }
                }
            },
            "bluestore_SharedBlob": {
                "items": 942,
                "bytes": 105504,
                "by_type": {
                    "BlueStore::SharedBlob": {
                        "items": 942,
                        "bytes": 105504
                    }
                }
            },
            "bluestore_inline_bl": {
                "items": 2,
                "bytes": 1990
            },
            "bluestore_fsck": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_txc": {
                "items": 11,
                "bytes": 8184,
                "by_type": {
                    "BlueStore::TransContext": {
                        "items": 11,
                        "bytes": 8184
                    }
                }
            },
            "bluestore_writing_deferred": {
                "items": 15,
                "bytes": 77996
            },
            "bluestore_writing": {
                "items": 52,
                "bytes": 333908
            },
            "bluefs": {
                "items": 3449,
                "bytes": 59984,
                "by_type": {
                    "BlueFS::Dir": {
                        "items": 3,
                        "bytes": 264
                    },
                    "BlueFS::File": {
                        "items": 70,
                        "bytes": 14560
                    },
                    "BlueFS::FileLock": {
                        "items": 1,
                        "bytes": 8
                    }
                }
            },
            "bluefs_file_reader": {
                "items": 124,
                "bytes": 818688,
                "by_type": {
                    "BlueFS::FileReader": {
                        "items": 60,
                        "bytes": 7680
                    },
                    "BlueFS::FileReaderBuffer": {
                        "items": 0,
                        "bytes": 0
                    }
                }
            },
            "bluefs_file_writer": {
                "items": 4,
                "bytes": 896,
                "by_type": {
                    "BlueFS::FileWriter": {
                        "items": 4,
                        "bytes": 896
                    }
                }
            },
            "buffer_anon": {
                "items": 8420,
                "bytes": 4427920
            },
            "buffer_meta": {
                "items": 715,
                "bytes": 62920,
                "by_type": {
                    "ceph::buffer::v15_2_0::raw_char": {
                        "items": 0,
                        "bytes": 0
                    },
                    "ceph::buffer::v15_2_0::raw_claimed_char": {
                        "items": 0,
                        "bytes": 0
                    },
                    "ceph::buffer::v15_2_0::raw_malloc": {
                        "items": 0,
                        "bytes": 0
                    },
                    "ceph::buffer::v15_2_0::raw_posix_aligned": {
                        "items": 715,
                        "bytes": 62920
                    },
                    "ceph::buffer::v15_2_0::raw_static": {
                        "items": 0,
                        "bytes": 0
                    }
                }
            },
            "osd": {
                "items": 101,
                "bytes": 1306536,
                "by_type": {
                    "PGPeeringEvent": {
                        "items": 0,
                        "bytes": 0
                    },
                    "PrimaryLogPG": {
                        "items": 101,
                        "bytes": 1306536
                    }
                }
            },
            "osd_mapbl": {
                "items": 0,
                "bytes": 0
            },
            "osd_pglog": {
                "items": 403016,
                "bytes": 167032496,
                "by_type": {
                    "std::_Rb_tree_node<std::pair<unsigned int const,
int> >": {
                        "items": 0,
                        "bytes": 0
                    },
                    "std::pair<osd_reqid_t, unsigned long>": {
                        "items": 0,
                        "bytes": 0
                    }
                }
            },
            "osdmap": {
                "items": 34875,
                "bytes": 1428352,
                "by_type": {
                    "OSDMap": {
                        "items": 51,
                        "bytes": 61608
                    },
                    "OSDMap::Incremental": {
                        "items": 0,
                        "bytes": 0
                    }
                }
            },
            "osdmap_mapping": {
                "items": 0,
                "bytes": 0
            },
            "pgmap": {
                "items": 0,
                "bytes": 0,
                "by_type": {
                    "PGMap": {
                        "items": 0,
                        "bytes": 0
                    },
                    "PGMap::Incremental": {
                        "items": 0,
                        "bytes": 0
                    },
                    "PGMapDigest": {
                        "items": 0,
                        "bytes": 0
                    }
                }
            },
            "mds_co": {
                "items": 0,
                "bytes": 0
            },
            "unittest_1": {
                "items": 0,
                "bytes": 0
            },
            "unittest_2": {
                "items": 0,
                "bytes": 0
            }
        },
        "total": {
            "items": 96828642,
            "bytes": 3103124287
        }
    }
}


#ceph tell osd.5 heap dump

osd.5 dumping heap profile now.
------------------------------------------------
MALLOC:     1935971080 ( 1846.3 MiB) Bytes in use by application
MALLOC: +       131072 (    0.1 MiB) Bytes in page heap freelist
MALLOC: +    146011104 (  139.2 MiB) Bytes in central cache freelist
MALLOC: +      8280576 (    7.9 MiB) Bytes in transfer cache freelist
MALLOC: +    147054360 (  140.2 MiB) Bytes in thread cache freelists
MALLOC: +     23986176 (   22.9 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =   2261434368 ( 2156.7 MiB) Actual memory used (physical +
swap)
MALLOC: +   2414272512 ( 2302.4 MiB) Bytes released to OS (aka
unmapped)
MALLOC:   ------------
MALLOC: =   4675706880 ( 4459.1 MiB) Virtual address space used
MALLOC:
MALLOC:         142742              Spans in use
MALLOC:             86              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------

Maybe related:

https://tracker.ceph.com/issues/55761

osd-bluestore-memorypool.png View (162 KB) alexandre derumier, 06/30/2022 05:38 AM

bluestore_cache_other bytes-data-2022-06-30 08_18_22.csv View (45 KB) alexandre derumier, 06/30/2022 06:21 AM

bluestore_cache_other items-data-2022-06-30 08_17_23.csv View (42.2 KB) alexandre derumier, 06/30/2022 06:21 AM

osd-other-cache-items.png View (162 KB) alexandre derumier, 06/30/2022 06:21 AM

osd-other-cache-bytes.png View (162 KB) alexandre derumier, 06/30/2022 06:21 AM

osd-other-cache-bytes.png View (47.9 KB) alexandre derumier, 06/30/2022 06:30 AM

osd-other-cache-items.png View (51.6 KB) alexandre derumier, 06/30/2022 06:30 AM

prunedtail.log View (23.5 KB) alexandre derumier, 06/30/2022 11:56 AM

a.diff View - patch for v15.2.16 to fix blob's AUs accounting (6.06 KB) Igor Fedotov, 06/30/2022 07:40 PM

a2.diff View - fixed patch (6.08 KB) Igor Fedotov, 07/01/2022 01:43 PM

cache_other_items.png View (76.3 KB) alexandre derumier, 07/04/2022 06:27 AM

allpools.png View (95.4 KB) alexandre derumier, 07/04/2022 06:27 AM


Related issues

Copied to bluestore - Backport #56598: quincy: octopus : bluestore_cache_other pool memory leak ? In Progress
Copied to bluestore - Backport #56599: pacific: octopus : bluestore_cache_other pool memory leak ? Resolved
Copied to bluestore - Backport #56600: octopus: octopus : bluestore_cache_other pool memory leak ? New

History

#1 Updated by alexandre derumier about 2 months ago

some detailled of other_cache stats for osd.5 over last 24h

items:
80017467 -> 80716079

size:
2781599188 bytes -> 2806319020 bytes

I have attached detailled graph && csv metric, for each minutes over last 24h

#2 Updated by alexandre derumier about 2 months ago

sorry,

wrong screenshot of last 24h is last post.

here the correct graphs

#3 Updated by alexandre derumier about 2 months ago

here a 1 minute log with debug 20/20 of osd.5

(no scrub, no snap trim during this time)

https://mutulin1.odiso.net/osd5.debug.log.gz

#4 Updated by Igor Fedotov about 2 months ago

Alexandre,
could you please set debug_bluestore to 20 for 5-10 mins (be careful as the log will grow drastically) and grep for "pruned tail" in the resulting log.

#5 Updated by alexandre derumier about 2 months ago

here the grep on "pruned tailed" with debug_bluestore 20/20

#7 Updated by Igor Fedotov about 2 months ago

Highly likely this fix https://github.com/ceph/ceph/pull/46911 is relevant. Not sure it fixes everything though..
It's under my local QA atm but when it's done it would be interesting to apply to the cluster-in-question if possible.
@Alexandre would you be able to try that?

#8 Updated by alexandre derumier about 2 months ago

Igor Fedotov wrote:

Highly likely this fix https://github.com/ceph/ceph/pull/46911 is relevant. Not sure it fixes everything though..
It's under my local QA atm but when it's done it would be interesting to apply to the cluster-in-question if possible.
@Alexandre would you be able to try that?

yes, sure,I can try if you can provide me a octopus backport patch.
(or better, if you can build a deb for me (it's a debian 11 server). if not, i can build it myself, no problem)

The patch's diff has been attached. Sorry I'm not able to build a deb at the moment.

#9 Updated by alexandre derumier about 2 months ago

BTW, could you give me a small explain of what is the current problem ?

I have 4 others cluster with same config, and I also see same behaviour on some osd. (not all osd).

on the specific cluster, I'm seeing this on all osd.

Is it related to the workload ?

#10 Updated by Igor Fedotov about 2 months ago

  • File a.diff added

alexandre derumier wrote:

BTW, could you give me a small explain of what is the current problem ?

Well the problem I can see relates to improper mempool stats math when tracking allocation units within a blob.
BlueStore allocates an array of entries for allocation units (aka pextents) belonging to a blob while keeping the latter in memory. This entries are attached to bluestore_cache_other mempool. But in some cases, e.g. when doing blob split or tail pruning, some entries could be "leaked" - relevant mempool's counters are updated improperly and hence keep growing over time. This doesn't cause real memory leak though - just improper mempool stats. Nevertheless the latter might in turn cause improper BlueStore caching (e.g. redundant trimming).

I'm still not 100% sure the issue I found is the only bug though.

I have 4 others cluster with same config, and I also see same behaviour on some osd. (not all osd).

on the specific cluster, I'm seeing this on all osd.

Is it related to the workload ?

Well this is apparently relevant to partial object overwrites hence rbd and perhaps CephFS are more exposed to the issue than RGW which primarily performs full object writes. Evidently this is irrelevant to object reading and metadata access scenarios too.

#11 Updated by Igor Fedotov about 2 months ago

  • File deleted (a.diff)

#12 Updated by Igor Fedotov about 2 months ago

#13 Updated by Igor Fedotov about 2 months ago

@Alexandre, I'd recommend to try the patch using a single OSD only. Just to avoid any unexpected OSD misbehavior - this has got a pretty limited QA coverage so far...

#14 Updated by alexandre derumier about 2 months ago

I'm still not 100% sure the issue I found is the only bug though.

thanks for the explain. (so, if I understand, maybe the bluestore autotuning is lowering the meta,onode caches because it's seeing a lot of entries in other_cache. But not real memory leak (as the osd process only use 2,5GB memory).

Well this is apparently relevant to partial object overwrites hence rbd and perhaps CephFS are more exposed to the issue than RGW which primarily performs full object writes. Evidently this is irrelevant to object reading and metadata access scenarios too.

no cephfs here, only rbd (only qemu clients with librbd)

also, I have always 1 snapshot for each rbd, as I'm backuping with rbd export-diff|import.

not sure it could be related, but qemu client use discard, and also, I'm using rbd_skip_partial_discard=true.

#15 Updated by alexandre derumier about 2 months ago

Igor Fedotov wrote:

@Alexandre, I'd recommend to try the patch using a single OSD only. Just to avoid any unexpected OSD misbehavior - this has got a pretty limited QA coverage so far...

can I simply build the ceph-osd binary with this patch ? or does it need another lib ? I would like to known if I could simply use this patched ceph-osd binary in a systemd service )

#16 Updated by Igor Fedotov about 2 months ago

alexandre derumier wrote:

Igor Fedotov wrote:

@Alexandre, I'd recommend to try the patch using a single OSD only. Just to avoid any unexpected OSD misbehavior - this has got a pretty limited QA coverage so far...

can I simply build the ceph-osd binary with this patch ? or does it need another lib ? I would like to known if I could simply use this patched ceph-osd binary in a systemd service )

Can't say for sure but this could work. I don't see any other shared libraries affected by this patch

#17 Updated by alexandre derumier about 2 months ago

Igor Fedotov wrote:

alexandre derumier wrote:

Igor Fedotov wrote:

@Alexandre, I'd recommend to try the patch using a single OSD only. Just to avoid any unexpected OSD misbehavior - this has got a pretty limited QA coverage so far...

can I simply build the ceph-osd binary with this patch ? or does it need another lib ? I would like to known if I could simply use this patched ceph-osd binary in a systemd service )

Can't say for sure but this could work. I don't see any other shared libraries affected by this patch

ok, I just finish to compile the patched version,
I'm running 1 osd of the cluster with the patch (seem to works for now).

I'll send stats monday.

#18 Updated by alexandre derumier about 2 months ago

ouch, seem buggy.

I have crazy values

"bluestore_cache_other": {
"items": 154949541051559,
"bytes": 57261864
},

after 2-3 s

"bluestore_cache_other": {
"items": 161868733531354,
"bytes": 58916516
},

#19 Updated by Igor Fedotov about 2 months ago

my bad... fixing...

#20 Updated by Igor Fedotov about 2 months ago

This one should work properly. Please try again

#21 Updated by alexandre derumier about 2 months ago

Igor Fedotov wrote:

This one should work properly. Please try again

ok, no more crazy values, thanks !

So I'll keep it running fo the weekend.

I'll send stats monday.

#22 Updated by Vikhyat Umrao about 2 months ago

  • Status changed from New to In Progress
  • Assignee set to Igor Fedotov
  • Pull request ID set to 46911

#23 Updated by alexandre derumier about 1 month ago

Hi,

I think it's fixing the problem.

Looking at stats, I still see small increase of cache other over time,
but I'm also seeing items released.
So on the 2 last days, memory seem stabilize around 450-500mb memory

I have compared with another non patched osd, restarted at the same time, I'm already around 1gb memory usage

I'll keep it running, and I'll send new stats next week to be sure.

root@mindceph1-1:~# ceph daemon osd.5 dump_mempools
{
    "mempool": {
        "by_pool": {
            "bloom_filter": {
                "items": 0,
                "bytes": 0,
                "by_type": {
                    "unsigned char": {
                        "items": 0,
                        "bytes": 0
                    }
                }
            },
            "bluestore_alloc": {
                "items": 13024860,
                "bytes": 104198880,
                "by_type": {
                    "range_seg_t": {
                        "items": 0,
                        "bytes": 0
                    }
                }
            },
            "bluestore_cache_data": {
                "items": 16199,
                "bytes": 177795204
            },
            "bluestore_cache_onode": {
                "items": 66994,
                "bytes": 41268304,
                "by_type": {
                    "BlueStore::Onode": {
                        "items": 66994,
                        "bytes": 41268304
                    }
                }
            },
            "bluestore_cache_meta": {
                "items": 8007848,
                "bytes": 72404895,
                "by_type": {
                    "BlueStore::ExtentMap::Shard": {
                        "items": 611,
                        "bytes": 9776
                    },
                    "char": {
                        "items": 4361,
                        "bytes": 4361
                    },
                    "std::_Rb_tree_node<std::pair<int const, boost::intrusive_ptr<BlueStore::Blob> > >": {
                        "items": 58,
                        "bytes": 2784
                    },
                    "std::_Rb_tree_node<std::pair<unsigned int const, std::unique_ptr<BlueStore::Buffer, std::default_delete<BlueStore::Buffer> > > >": {
                        "items": 104,
                        "bytes": 4992
                    }
                }
            },
            "bluestore_cache_other": {
                "items": 48419484,
                "bytes": 457378724,
                "by_type": {
                    "bluestore_pextent_t": {
                        "items": 3521,
                        "bytes": 56336
                    },
                    "bluestore_shared_blob_t": {
                        "items": 21778,
                        "bytes": 1568016
                    },
                    "std::_Rb_tree_node<std::pair<unsigned long const, bluestore_extent_ref_map_t::record_t> >": {
                        "items": 11,
                        "bytes": 528
                    }
                }
            },
            "bluestore_Buffer": {
                "items": 15845,
                "bytes": 1521120,
                "by_type": {
                    "BlueStore::Buffer": {
                        "items": 15845,
                        "bytes": 1521120
                    }
                }
            },
            "bluestore_Extent": {
                "items": 2286242,
                "bytes": 109739616,
                "by_type": {
                    "BlueStore::Extent": {
                        "items": 2286242,
                        "bytes": 109739616
                    }
                }
            },
            "bluestore_Blob": {
                "items": 2256943,
                "bytes": 252777616,
                "by_type": {
                    "BlueStore::Blob": {
                        "items": 2256943,
                        "bytes": 252777616
                    }
                }
            },
            "bluestore_SharedBlob": {
                "items": 2230874,
                "bytes": 249857888,
                "by_type": {
                    "BlueStore::SharedBlob": {
                        "items": 2230874,
                        "bytes": 249857888
                    }
                }
            },
            "bluestore_inline_bl": {
                "items": 1759,
                "bytes": 771920
            },
            "bluestore_fsck": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_txc": {
                "items": 11,
                "bytes": 8184,
                "by_type": {
                    "BlueStore::TransContext": {
                        "items": 11,
                        "bytes": 8184
                    }
                }
            },
            "bluestore_writing_deferred": {
                "items": 99,
                "bytes": 429735
            },
            "bluestore_writing": {
                "items": 50,
                "bytes": 204800
            },
            "bluefs": {
                "items": 3071,
                "bytes": 55984,
                "by_type": {
                    "BlueFS::Dir": {
                        "items": 3,
                        "bytes": 264
                    },
                    "BlueFS::File": {
                        "items": 72,
                        "bytes": 14976
                    },
                    "BlueFS::FileLock": {
                        "items": 1,
                        "bytes": 8
                    }
                }
            },
            "bluefs_file_reader": {
                "items": 124,
                "bytes": 1191680,
                "by_type": {
                    "BlueFS::FileReader": {
                        "items": 62,
                        "bytes": 7936
                    },
                    "BlueFS::FileReaderBuffer": {
                        "items": 0,
                        "bytes": 0
                    }
                }
            },
            "bluefs_file_writer": {
                "items": 4,
                "bytes": 896,
                "by_type": {
                    "BlueFS::FileWriter": {
                        "items": 4,
                        "bytes": 896
                    }
                }
            },
            "buffer_anon": {
                "items": 90540,
                "bytes": 21508828
            },
            "buffer_meta": {
                "items": 77096,
                "bytes": 6784448,
                "by_type": {
                    "ceph::buffer::v15_2_0::raw_char": {
                        "items": 0,
                        "bytes": 0
                    },
                    "ceph::buffer::v15_2_0::raw_claimed_char": {
                        "items": 0,
                        "bytes": 0
                    },
                    "ceph::buffer::v15_2_0::raw_malloc": {
                        "items": 0,
                        "bytes": 0
                    },
                    "ceph::buffer::v15_2_0::raw_posix_aligned": {
                        "items": 77096,
                        "bytes": 6784448
                    },
                    "ceph::buffer::v15_2_0::raw_static": {
                        "items": 0,
                        "bytes": 0
                    }
                }
            },
            "osd": {
                "items": 102,
                "bytes": 1319472,
                "by_type": {
                    "PGPeeringEvent": {
                        "items": 0,
                        "bytes": 0
                    },
                    "PrimaryLogPG": {
                        "items": 102,
                        "bytes": 1319472
                    }
                }
            },
            "osd_mapbl": {
                "items": 0,
                "bytes": 0
            },
            "osd_pglog": {
                "items": 404997,
                "bytes": 168455376,
                "by_type": {
                    "std::_Rb_tree_node<std::pair<unsigned int const, int> >": {
                        "items": 0,
                        "bytes": 0
                    },
                    "std::pair<osd_reqid_t, unsigned long>": {
                        "items": 0,
                        "bytes": 0
                    }
                }
            },
            "osdmap": {
                "items": 35425,
                "bytes": 1454296,
                "by_type": {
                    "OSDMap": {
                        "items": 51,
                        "bytes": 61608
                    },
                    "OSDMap::Incremental": {
                        "items": 0,
                        "bytes": 0
                    }
                }
            },
            "osdmap_mapping": {
                "items": 0,
                "bytes": 0
            },
            "pgmap": {
                "items": 0,
                "bytes": 0,
                "by_type": {
                    "PGMap": {
                        "items": 0,
                        "bytes": 0
                    },
                    "PGMap::Incremental": {
                        "items": 0,
                        "bytes": 0
                    },
                    "PGMapDigest": {
                        "items": 0,
                        "bytes": 0
                    }
                }
            },
            "mds_co": {
                "items": 0,
                "bytes": 0
            },
            "unittest_1": {
                "items": 0,
                "bytes": 0
            },
            "unittest_2": {
                "items": 0,
                "bytes": 0
            }
        },
        "total": {
            "items": 76938567,
            "bytes": 1669127866
        }
    }
}

 ceph tell osd.5 heap dump
osd.5 dumping heap profile now.
------------------------------------------------
MALLOC:     3775733096 ( 3600.8 MiB) Bytes in use by application
MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +    342231208 (  326.4 MiB) Bytes in central cache freelist
MALLOC: +     11891200 (   11.3 MiB) Bytes in transfer cache freelist
MALLOC: +    139913200 (  133.4 MiB) Bytes in thread cache freelists
MALLOC: +     24117248 (   23.0 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =   4293885952 ( 4095.0 MiB) Actual memory used (physical + swap)
MALLOC: +    247840768 (  236.4 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =   4541726720 ( 4331.3 MiB) Virtual address space used
MALLOC:
MALLOC:         346862              Spans in use
MALLOC:             85              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

#24 Updated by Igor Fedotov about 1 month ago

  • Backport set to quincy, pacific, octopus

#25 Updated by Igor Fedotov about 1 month ago

  • Status changed from In Progress to Fix Under Review

#26 Updated by Igor Fedotov 30 days ago

alexandre derumier wrote:

Hi,

I think it's fixing the problem.

I'll keep it running, and I'll send new stats next week to be sure.

[...]

@Alexandre - any updates?

#27 Updated by Igor Fedotov 30 days ago

  • Status changed from Fix Under Review to Pending Backport

#28 Updated by Backport Bot 30 days ago

  • Copied to Backport #56598: quincy: octopus : bluestore_cache_other pool memory leak ? added

#29 Updated by Backport Bot 30 days ago

  • Copied to Backport #56599: pacific: octopus : bluestore_cache_other pool memory leak ? added

#30 Updated by Backport Bot 30 days ago

  • Copied to Backport #56600: octopus: octopus : bluestore_cache_other pool memory leak ? added

#31 Updated by Igor Fedotov 19 days ago

  • Subject changed from octopus : bluestore_cache_other pool memory leak ? to bluestore_cache_other mempool entry leak

#32 Updated by Backport Bot 8 days ago

  • Tags set to backport_processed

#33 Updated by Igor Fedotov 5 days ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF