Bug #55761
closedceph osd bluestore memory leak
0%
Description
hi, everyone, we have a ceph cluster and we only user rgw with EC Pool, now the cluster osd memory keeps growing to more than 16gb
- ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)
- osd config
ceph daemon osd.374 config get osd_memory_target { "osd_memory_target": "4294967296" }
osd memory used
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2158658 CEPH 20 0 17.023G 0.016T 10700 S 5.6 12.9 97:37.84 CEPH-OSD 2167500 CEPH 20 0 16.503G 0.015T 17392 S 1.7 12.1 103:05.32 CEPH-OSD 2164715 CEPH 20 0 14.230G 0.013T 19904 S 1.3 10.6 101:39.31 CEPH-OSD 2163249 CEPH 20 0 13.249G 0.012T 20068 S 1.3 9.9 94:41.75 CEPH-OSD 2160621 CEPH 20 0 12.728G 0.012T 9256 S 2.3 9.4 103:20.76 CEPH-OSD 2158042 CEPH 20 0 12.748G 0.011T 22344 S 1.0 9.3 88:23.16 CEPH-OSD 2155989 CEPH 20 0 11.843G 0.010T 22656 S 4.3 8.5 91:13.34 CEPH-OSD 2168721 CEPH 20 0 10.896G 0.010T 20204 S 4.6 8.0 93:46.26 CEPH-OSD 2166929 CEPH 20 0 10.458G 9.538G 11004 S 1.3 7.6 91:26.08 CEPH-OSD 2161262 CEPH 20 0 9011516 7.734G 9600 S 1.7 6.2 98:06.58 CEPH-OSD
osd heap dump info
osd.374 dumping heap profile now. ------------------------------------------------ MALLOC: 16957882712 (16172.3 MiB) Bytes in use by application MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist MALLOC: + 241870520 ( 230.7 MiB) Bytes in central cache freelist MALLOC: + 11673568 ( 11.1 MiB) Bytes in transfer cache freelist MALLOC: + 81639440 ( 77.9 MiB) Bytes in thread cache freelists MALLOC: + 75378880 ( 71.9 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 17368445120 (16563.8 MiB) Actual memory used (physical + swap) MALLOC: + 3547136 ( 3.4 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 17371992256 (16567.2 MiB) Virtual address space used MALLOC: MALLOC: 1110696 Spans in use MALLOC: 47 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory.
osd used memory 16GB
osd dump_mempools
{ "bloom_filter": { "items": 0, "bytes": 0 }, "bluestore_alloc": { "items": 15718400, "bytes": 15718400 }, "bluestore_cache_data": { "items": 0, "bytes": 0 }, "bluestore_cache_onode": { "items": 2747609, "bytes": 1846393248 }, "bluestore_cache_other": { "items": 682869756, "bytes": 9725848257 }, "bluestore_fsck": { "items": 0, "bytes": 0 }, "bluestore_txc": { "items": 60, "bytes": 44160 }, "bluestore_writing_deferred": { "items": 641, "bytes": 2639419 }, "bluestore_writing": { "items": 1585, "bytes": 6707995 }, "bluefs": { "items": 5971, "bytes": 227232 }, "buffer_anon": { "items": 16814, "bytes": 15956676 }, "buffer_meta": { "items": 115, "bytes": 10120 }, "osd": { "items": 117, "bytes": 1480752 }, "osd_mapbl": { "items": 0, "bytes": 0 }, "osd_pglog": { "items": 525725, "bytes": 103983300 }, "osdmap": { "items": 137722, "bytes": 3183664 }, "osdmap_mapping": { "items": 0, "bytes": 0 }, "pgmap": { "items": 0, "bytes": 0 }, "mds_co": { "items": 0, "bytes": 0 }, "unittest_1": { "items": 0, "bytes": 0 }, "unittest_2": { "items": 0, "bytes": 0 }, "total": { "items": 702024515, "bytes": 11722193223 } }
the bluestore_cache_other value is very high
what happen?
Updated by dovefi Z almost 2 years ago
i set the mempool_debug to true ,and dump again
{ "bloom_filter": { "items": 0, "bytes": 0, "by_type": { "unsigned char": { "items": 0, "bytes": 0 } } }, "bluestore_alloc": { "items": 15718400, "bytes": 15718400 }, "bluestore_cache_data": { "items": 0, "bytes": 0 }, "bluestore_cache_onode": { "items": 2747611, "bytes": 1846394592, "by_type": { "BlueStore::Onode": { "items": 2747611, "bytes": 1846394592 } } }, "bluestore_cache_other": { "items": 682869511, "bytes": 9725841859, "by_type": { "BlueStore::Blob": { "items": 5420397, "bytes": 563721288 }, "BlueStore::Buffer": { "items": 160, "bytes": 24320 }, "BlueStore::Extent": { "items": 5419151, "bytes": 260119248 }, "BlueStore::ExtentMap::Shard": { "items": 114, "bytes": 1824 }, "BlueStore::SharedBlob": { "items": 5420397, "bytes": 607084464 }, "bluestore_pextent_t": { "items": 163800, "bytes": 2620800 }, "char": { "items": 19153107, "bytes": 19153107 }, "std::_Rb_tree_node<std::pair<int const, boost::intrusive_ptr<BlueStore::Blob> > >": { "items": 38, "bytes": 1824 }, "std::_Rb_tree_node<std::pair<unsigned int const, std::unique_ptr<BlueStore::Buffer, std::default_delete<BlueStore::Buffer> > > >": { "items": 0, "bytes": 0 }, "std::__detail::_Hash_node_base*": { "items": 126698, "bytes": 1013584 }, "std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >": { "items": 0, "bytes": 0 } } }, "bluestore_fsck": { "items": 0, "bytes": 0 }, "bluestore_txc": { "items": 62, "bytes": 45632, "by_type": { "BlueStore::TransContext": { "items": 62, "bytes": 45632 } } }, "bluestore_writing_deferred": { "items": 665, "bytes": 2735063 }, "bluestore_writing": { "items": 1647, "bytes": 6967279 }, "bluefs": { "items": 6243, "bytes": 233760, "by_type": { "BlueFS::Dir": { "items": 3, "bytes": 264 }, "BlueFS::File": { "items": 334, "bytes": 53440 }, "BlueFS::FileLock": { "items": 1, "bytes": 8 }, "BlueFS::FileReader": { "items": 323, "bytes": 41344 }, "BlueFS::FileReaderBuffer": { "items": 0, "bytes": 0 }, "BlueFS::FileWriter": { "items": 6, "bytes": 1920 } } }, "buffer_anon": { "items": 16833, "bytes": 15695833 }, "buffer_meta": { "items": 117, "bytes": 10296, "by_type": { "ceph::buffer::raw_char": { "items": 0, "bytes": 0 }, "ceph::buffer::raw_claimed_char": { "items": 0, "bytes": 0 }, "ceph::buffer::raw_malloc": { "items": 0, "bytes": 0 }, "ceph::buffer::raw_mmap_pages": { "items": 0, "bytes": 0 }, "ceph::buffer::raw_pipe": { "items": 0, "bytes": 0 }, "ceph::buffer::raw_posix_aligned": { "items": 117, "bytes": 10296 }, "ceph::buffer::raw_static": { "items": 0, "bytes": 0 }, "ceph::buffer::raw_unshareable": { "items": 0, "bytes": 0 } } }, "osd": { "items": 117, "bytes": 1480752, "by_type": { "PG::CephPeeringEvt": { "items": 0, "bytes": 0 }, "PrimaryLogPG": { "items": 117, "bytes": 1480752 } } }, "osd_mapbl": { "items": 0, "bytes": 0 }, "osd_pglog": { "items": 525725, "bytes": 103983300 }, "osdmap": { "items": 135706, "bytes": 3151408, "by_type": { "OSDMap": { "items": 51, "bytes": 43248 }, "OSDMap::Incremental": { "items": 0, "bytes": 0 } } }, "osdmap_mapping": { "items": 0, "bytes": 0 }, "pgmap": { "items": 0, "bytes": 0 }, "mds_co": { "items": 0, "bytes": 0 }, "unittest_1": { "items": 0, "bytes": 0 }, "unittest_2": { "items": 0, "bytes": 0 }, "total": { "items": 702022637, "bytes": 11722258174 } }
bluestore_cache_other total value not equal to all the type value sum
Updated by dovefi Z almost 2 years ago
ceph daemon osd.375 config get bluestore_cache_type { "bluestore_cache_type": "2q" }
the cache type is 2q
Updated by dovefi Z almost 2 years ago
and we can see memory use from bluestore log
2022-05-26 17:43:35.953782 7feea4f20700 5 bluestore.MempoolThread(0x55e2c5293bf0) _trim_shards cache_size: 134217728 kv_alloc: 134217728 kv_used: 134179922 meta_alloc: 0 meta_used: 11572236451 data_alloc: 0 data_used: 0 2022-05-26 17:43:36.154367 7feea4f20700 5 bluestore.MempoolThread(0x55e2c5293bf0) _tune_cache_size target: 4294967296 heap: 17317584896 unmapped: 1228800 mapped: 17316356096 old cache_size: 134217728 new cache size: 134217728 2022-05-26 17:43:40.964708 7feea4f20700 5 bluestore.MempoolThread(0x55e2c5293bf0) _trim_shards cache_size: 134217728 kv_alloc: 134217728 kv_used: 134179922 meta_alloc: 0 meta_used: 11572236451 data_alloc: 0 data_used: 0 2022-05-26 17:43:41.165183 7feea4f20700 5 bluestore.MempoolThread(0x55e2c5293bf0) _tune_cache_size target: 4294967296 heap: 17317584896 unmapped: 1228800 mapped: 17316356096 old cache_size: 134217728 new cache size: 134217728 2022-05-26 17:43:45.975669 7feea4f20700 5 bluestore.MempoolThread(0x55e2c5293bf0) _trim_shards cache_size: 134217728 kv_alloc: 134217728 kv_used: 134179922 meta_alloc: 0 meta_used: 11572236451 data_alloc: 0 data_used: 0 2022-05-26 17:43:46.176221 7feea4f20700 5 bluestore.MempoolThread(0x55e2c5293bf0) _tune_cache_size target: 4294967296 heap: 17317584896 unmapped: 1228800 mapped: 17316356096 old cache_size: 134217728 new cache size: 134217728 2022-05-26 17:43:50.986562 7feea4f20700 5 bluestore.MempoolThread(0x55e2c5293bf0) _trim_shards cache_size: 134217728 kv_alloc: 134217728 kv_used: 134179922 meta_alloc: 0 meta_used: 11572236451 data_alloc: 0 data_used: 0 2022-05-26 17:43:51.186998 7feea4f20700 5 bluestore.MempoolThread(0x55e2c5293bf0) _tune_cache_size target: 4294967296 heap: 17317584896 unmapped: 1228800 mapped: 17316356096 old cache_size: 134217728 new cache size: 134217728 2022-05-26 17:43:55.997303 7feea4f20700 5 bluestore.MempoolThread(0x55e2c5293bf0) _trim_shards cache_size: 134217728 kv_alloc: 134217728 kv_used: 134179922 meta_alloc: 0 meta_used: 11572236451 data_alloc: 0 data_used: 0 2022-05-26 17:43:56.197714 7feea4f20700 5 bluestore.MempoolThread(0x55e2c5293bf0) _tune_cache_size target: 4294967296 heap: 17317584896 unmapped: 1228800 mapped: 17316356096 old cache_size: 134217728 new cache size: 134217728 2022-05-26 17:44:01.007777 7feea4f20700 5 bluestore.MempoolThread(0x55e2c5293bf0) _trim_shards cache_size: 134217728 kv_alloc: 134217728 kv_used: 134179922 meta_alloc: 0 meta_used: 11572236451 data_alloc: 0 data_used: 0 2022-05-26 17:44:01.208322 7feea4f20700 5 bluestore.MempoolThread(0x55e2c5293bf0) _tune_cache_size target: 4294967296 heap: 17317584896 unmapped: 1228800 mapped: 17316356096 old cache_size: 134217728 new cache size: 134217728 2022-05-26 17:44:06.018235 7feea4f20700 5 bluestore.MempoolThread(0x55e2c5293bf0) _trim_shards cache_size: 134217728 kv_alloc: 134217728 kv_used: 134179922 meta_alloc: 0 meta_used: 11572236451 data_alloc: 0 data_used: 0 2022-05-26 17:44:06.218651 7feea4f20700 5 bluestore.MempoolThread(0x55e2c5293bf0) _tune_cache_size target: 4294967296 heap: 17317584896 unmapped: 1228800 mapped: 17316356096 old cache_size: 134217728 new cache size: 134217728
Updated by Igor Fedotov almost 2 years ago
Hard to say without additional debugging but your issue reminds me another case with a bug in onode cache trimming: https://tracker.ceph.com/issues/48729
And Luminous release at your cluster is definitely VERY outdated. It's out of support for some years already.
Not to mention that 12.2.12 is not even the latest Luminous(!) minor release...
Given that I doubt you can get much support with it.
Updated by dovefi Z almost 2 years ago
Igor Fedotov wrote:
Hard to say without additional debugging but your issue reminds me another case with a bug in onode cache trimming: https://tracker.ceph.com/issues/48729
And Luminous release at your cluster is definitely VERY outdated. It's out of support for some years already.
Not to mention that 12.2.12 is not even the latest Luminous(!) minor release...
Given that I doubt you can get much support with it.
Yes, very similar to this bugļ¼osd memory keep growing during scrub and deep scrub
Updated by dovefi Z almost 2 years ago
dovefi Z wrote:
Igor Fedotov wrote:
Hard to say without additional debugging but your issue reminds me another case with a bug in onode cache trimming: https://tracker.ceph.com/issues/48729
And Luminous release at your cluster is definitely VERY outdated. It's out of support for some years already.
Not to mention that 12.2.12 is not even the latest Luminous(!) minor release...
Given that I doubt you can get much support with it.
can you give some debug advice, it hard for us to upgrade a ceph cluster in production
Updated by Igor Fedotov almost 2 years ago
You can try to temporarily raise debug-bluestore level to 20, leave it running for a minute or two and grep OSD log for "maximum skip pinned reached" pattern. If present then highly likely you're facing the bug I shared
above. You can try to workaround that misbehavior with raising bluestore_cache_trim_max_skip_pinned config parameter to a pretty high value (e.g. 100000) and see if it helps.
Do not forget to revert debug-bluestore back to default once log analysis is completed as the log might grow pretty large.
Updated by dovefi Z almost 2 years ago
Igor Fedotov wrote:
You can try to temporarily raise debug-bluestore level to 20, leave it running for a minute or two and grep OSD log for "maximum skip pinned reached" pattern. If present then highly likely you're facing the bug I shared
above. You can try to workaround that misbehavior with raising bluestore_cache_trim_max_skip_pinned config parameter to a pretty high value (e.g. 100000) and see if it helps.Do not forget to revert debug-bluestore back to default once log analysis is completed as the log might grow pretty large.
thank you for your advice.
Updated by dovefi Z almost 2 years ago
Igor Fedotov wrote:
Hard to say without additional debugging but your issue reminds me another case with a bug in onode cache trimming: https://tracker.ceph.com/issues/48729
And Luminous release at your cluster is definitely VERY outdated. It's out of support for some years already.
Not to mention that 12.2.12 is not even the latest Luminous(!) minor release...
Given that I doubt you can get much support with it.
We have confirmed that this is the same bug with https://tracker.ceph.com/issues/48729, change bluestore_cache_trim_max_skip_pinned config or backport with the pr can fix this problem
thank you very much!
Updated by Igor Fedotov almost 2 years ago
- Is duplicate of Bug #48729: Bluestore memory leak on srub operations added
Updated by Igor Fedotov almost 2 years ago
- Status changed from New to Duplicate
Updated by Igor Fedotov almost 2 years ago
- Project changed from Ceph to bluestore
- Category deleted (
OSD)