Support #51609
openOSD refuses to start (OOMK) due to pg split
0%
Description
After an upgrade to 15.2.13 from 15.2.4 my small home lab cluster ran into issues with OSDs failing on all four hosts. This might be unrelated to the upgrade but it looks like the trigger has been an autoscaling event where the RBD PG pool has been scaled from 128 PGs to 512 PGs. Unfortunately I didnt see that there were pgs being split before I initiated the reboot of one of the hosts to load the latest linux kernel
Only some OSDs are affected, and during the OSD startup the following output can be observed:
2021-07-08T03:57:55.496+0200 7fc7303ff700 10 osd.17 146136 split_pgs splitting pg[5.25( v 146017'38948152 (146011'38947652,146017'38948152] local-lis/les=146012/146013 n=1168 ec=2338/46 lis/c=146012/145792 les/c/f=146013/145793/36878 sis=146019) [17,6] r=0 lpr=146019 pi=[145792,146019)/1 crt=146017'38948152 lcod 0'0 mlcod 0'0 unknown mbc={}] into 5.a5
Exporting/Remove the PGs belonging to poolid(5) seems to resolve the issue of OOMK but yields dataloss (naturally). Pool 5 was the one which were being split.
There isn't a lot of activity in the log (20/20 logging) but everything seems to revolve around splitting PGs. Full OSD startup log attached.
At this point i've exported all the troublesome PGs and gotten all the OSDs online. Trying to import the pgs again causes the OSD to OOMK on startup again.
Attempting to start one of the troubled OSDs with the troubled PG will result in all memory (80GiB) to be exhausted before the OOM killer steps in. Looking at dump of mempools buffer_anon looks severely high? Memory leak?
{
"mempool": {
"by_pool": {
"bloom_filter": {
"items": 0,
"bytes": 0
},
"bluestore_alloc": {
"items": 2894923,
"bytes": 83560688
},
"bluestore_cache_data": {
"items": 228,
"bytes": 136914
},
"bluestore_cache_onode": {
"items": 214,
"bytes": 131824
},
"bluestore_cache_meta": {
"items": 8226,
"bytes": 48900
},
"bluestore_cache_other": {
"items": 571,
"bytes": 26252
},
"bluestore_Buffer": {
"items": 10,
"bytes": 960
},
"bluestore_Extent": {
"items": 13,
"bytes": 624
},
"bluestore_Blob": {
"items": 13,
"bytes": 1352
},
"bluestore_SharedBlob": {
"items": 13,
"bytes": 1456
},
"bluestore_inline_bl": {
"items": 0,
"bytes": 0
},
"bluestore_fsck": {
"items": 0,
"bytes": 0
},
"bluestore_txc": {
"items": 0,
"bytes": 0
},
"bluestore_writing_deferred": {
"items": 0,
"bytes": 0
},
"bluestore_writing": {
"items": 0,
"bytes": 0
},
"bluefs": {
"items": 421,
"bytes": 14728
},
"bluefs_file_reader": {
"items": 56,
"bytes": 5512704
},
"bluefs_file_writer": {
"items": 3,
"bytes": 672
},
"buffer_anon": {
"items": 16048567,
"bytes": 65862454806
},
"buffer_meta": {
"items": 1048,
"bytes": 92224
},
"osd": {
"items": 209,
"bytes": 2703624
},
"osd_mapbl": {
"items": 0,
"bytes": 0
},
"osd_pglog": {
"items": 24310250,
"bytes": 2547517232
},
"osdmap": {
"items": 1578,
"bytes": 92744
},
"osdmap_mapping": {
"items": 0,
"bytes": 0
},
"pgmap": {
"items": 0,
"bytes": 0
},
"mds_co": {
"items": 0,
"bytes": 0
},
"unittest_1": {
"items": 0,
"bytes": 0
},
"unittest_2": {
"items": 0,
"bytes": 0
}
},
"total": {
"items": 43266343,
"bytes": 68502297704
}
}
}
Any guidance on how to further troubleshoot this issue would be greatly appreciated.
Updated by Tor Martin Ølberg almost 3 years ago
Tor Martin Ølberg wrote:
After an upgrade to 15.2.13 from 15.2.4 my small home lab cluster ran into issues with OSDs failing on all four hosts. This might be unrelated to the upgrade but it looks like the trigger has been an autoscaling event where the RBD PG pool has been scaled from 128 PGs to 512 PGs. Unfortunately I didnt see that there were pgs being split before I initiated the reboot of one of the hosts to load the latest linux kernel
Only some OSDs are affected, and during the OSD startup the following output can be observed:
2021-07-08T03:57:55.496+0200 7fc7303ff700 10 osd.17 146136 split_pgs splitting pg[5.25( v 146017'38948152 (146011'38947652,146017'38948152] local-lis/les=146012/146013 n=1168 ec=2338/46 lis/c=146012/145792 les/c/f=146013/145793/36878 sis=146019) [17,6] r=0 lpr=146019 pi=[145792,146019)/1 crt=146017'38948152 lcod 0'0 mlcod 0'0 unknown mbc={}] into 5.a5
Exporting/Remove the PGs belonging to poolid(5) seems to resolve the issue of OOMK but yields dataloss (naturally). Pool 5 was the one which were being split.
There isn't a lot of activity in the log (20/20 logging) but everything seems to revolve around splitting PGs. Full OSD startup log attached.
At this point i've exported all the troublesome PGs and gotten all the OSDs online. Trying to import the pgs again causes the OSD to OOMK on startup again.
Attempting to start one of the troubled OSDs with the troubled PG will result in all memory (80GiB) to be exhausted before the OOM killer steps in. Looking at dump of mempools buffer_anon looks severely high? Memory leak?
[...]
Any guidance on how to further troubleshoot this issue would be greatly appreciated.
Updated; I've also tried compiling 16.2.5/Master and 15.2.4 all of them which seems to have the same bug when an OSD is trying to split pgs on boot
Updated by Igor Dell over 2 years ago
Tor Martin Ølberg wrote:
Tor Martin Ølberg wrote:
After an upgrade to 15.2.13 from 15.2.4 my small home lab cluster ran into issues with OSDs failing on all four hosts. This might be unrelated to the upgrade but it looks like the trigger has been an autoscaling event where the RBD PG pool has been scaled from 128 PGs to 512 PGs. Unfortunately I didnt see that there were pgs being split before I initiated the reboot of one of the hosts to load the latest linux kernel
Only some OSDs are affected, and during the OSD startup the following output can be observed:
2021-07-08T03:57:55.496+0200 7fc7303ff700 10 osd.17 146136 split_pgs splitting pg[5.25( v 146017'38948152 (146011'38947652,146017'38948152] local-lis/les=146012/146013 n=1168 ec=2338/46 lis/c=146012/145792 les/c/f=146013/145793/36878 sis=146019) [17,6] r=0 lpr=146019 pi=[145792,146019)/1 crt=146017'38948152 lcod 0'0 mlcod 0'0 unknown mbc={}] into 5.a5
Exporting/Remove the PGs belonging to poolid(5) seems to resolve the issue of OOMK but yields dataloss (naturally). Pool 5 was the one which were being split.
There isn't a lot of activity in the log (20/20 logging) but everything seems to revolve around splitting PGs. Full OSD startup log attached.
At this point i've exported all the troublesome PGs and gotten all the OSDs online. Trying to import the pgs again causes the OSD to OOMK on startup again.
Attempting to start one of the troubled OSDs with the troubled PG will result in all memory (80GiB) to be exhausted before the OOM killer steps in. Looking at dump of mempools buffer_anon looks severely high? Memory leak?
[...]
Any guidance on how to further troubleshoot this issue would be greatly appreciated.
Updated; I've also tried compiling 16.2.5/Master and 15.2.4 all of them which seems to have the same bug when an OSD is trying to split pgs on boot
Hi Martin,
i'm new on ceph, could you please explain, how you have triggered the split on boot.
Some weeks ago, I had similar issue with OOM. All OSD Pods are been killed by OOM on the weekend in my kubernetes ab.
I think my issue was, give not enough memory to osd pod that has trigger the first OOM at wrong time.
I try to reconstruct the issue.
Updated by Igor Fedotov over 2 years ago
- Related to Bug #53729: ceph-osd takes all memory before oom on boot added
Updated by Neha Ojha over 2 years ago
Tor Martin Ølberg wrote:
Tor Martin Ølberg wrote:
After an upgrade to 15.2.13 from 15.2.4 my small home lab cluster ran into issues with OSDs failing on all four hosts. This might be unrelated to the upgrade but it looks like the trigger has been an autoscaling event where the RBD PG pool has been scaled from 128 PGs to 512 PGs. Unfortunately I didnt see that there were pgs being split before I initiated the reboot of one of the hosts to load the latest linux kernel
Only some OSDs are affected, and during the OSD startup the following output can be observed:
2021-07-08T03:57:55.496+0200 7fc7303ff700 10 osd.17 146136 split_pgs splitting pg[5.25( v 146017'38948152 (146011'38947652,146017'38948152] local-lis/les=146012/146013 n=1168 ec=2338/46 lis/c=146012/145792 les/c/f=146013/145793/36878 sis=146019) [17,6] r=0 lpr=146019 pi=[145792,146019)/1 crt=146017'38948152 lcod 0'0 mlcod 0'0 unknown mbc={}] into 5.a5
Exporting/Remove the PGs belonging to poolid(5) seems to resolve the issue of OOMK but yields dataloss (naturally). Pool 5 was the one which were being split.
There isn't a lot of activity in the log (20/20 logging) but everything seems to revolve around splitting PGs. Full OSD startup log attached.
At this point i've exported all the troublesome PGs and gotten all the OSDs online. Trying to import the pgs again causes the OSD to OOMK on startup again.
Attempting to start one of the troubled OSDs with the troubled PG will result in all memory (80GiB) to be exhausted before the OOM killer steps in. Looking at dump of mempools buffer_anon looks severely high? Memory leak?
[...]
Any guidance on how to further troubleshoot this issue would be greatly appreciated.
Updated; I've also tried compiling 16.2.5/Master and 15.2.4 all of them which seems to have the same bug when an OSD is trying to split pgs on boot
Can you provide osd logs with debug_osd=20,debug_ms=1 before the OSD crashes? Also the output "ceph pg dump" and "ceph daemon osd.x dump_mempools" will be very useful.
Updated by Neha Ojha over 2 years ago
- Project changed from Ceph to RADOS
- Category deleted (
OSD)