Project

General

Profile

Actions

Support #51609

open

OSD refuses to start (OOMK) due to pg split

Added by Tor Martin Ølberg almost 3 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

After an upgrade to 15.2.13 from 15.2.4 my small home lab cluster ran into issues with OSDs failing on all four hosts. This might be unrelated to the upgrade but it looks like the trigger has been an autoscaling event where the RBD PG pool has been scaled from 128 PGs to 512 PGs. Unfortunately I didnt see that there were pgs being split before I initiated the reboot of one of the hosts to load the latest linux kernel

Only some OSDs are affected, and during the OSD startup the following output can be observed:

2021-07-08T03:57:55.496+0200 7fc7303ff700 10 osd.17 146136 split_pgs splitting pg[5.25( v 146017'38948152 (146011'38947652,146017'38948152] local-lis/les=146012/146013 n=1168 ec=2338/46 lis/c=146012/145792 les/c/f=146013/145793/36878 sis=146019) [17,6] r=0 lpr=146019 pi=[145792,146019)/1 crt=146017'38948152 lcod 0'0 mlcod 0'0 unknown mbc={}] into 5.a5

Exporting/Remove the PGs belonging to poolid(5) seems to resolve the issue of OOMK but yields dataloss (naturally). Pool 5 was the one which were being split.

There isn't a lot of activity in the log (20/20 logging) but everything seems to revolve around splitting PGs. Full OSD startup log attached.

At this point i've exported all the troublesome PGs and gotten all the OSDs online. Trying to import the pgs again causes the OSD to OOMK on startup again.

Attempting to start one of the troubled OSDs with the troubled PG will result in all memory (80GiB) to be exhausted before the OOM killer steps in. Looking at dump of mempools buffer_anon looks severely high? Memory leak?

{
    "mempool": {
        "by_pool": {
            "bloom_filter": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_alloc": {
                "items": 2894923,
                "bytes": 83560688
            },
            "bluestore_cache_data": {
                "items": 228,
                "bytes": 136914
            },
            "bluestore_cache_onode": {
                "items": 214,
                "bytes": 131824
            },
            "bluestore_cache_meta": {
                "items": 8226,
                "bytes": 48900
            },
            "bluestore_cache_other": {
                "items": 571,
                "bytes": 26252
            },
            "bluestore_Buffer": {
                "items": 10,
                "bytes": 960
            },
            "bluestore_Extent": {
                "items": 13,
                "bytes": 624
            },
            "bluestore_Blob": {
                "items": 13,
                "bytes": 1352
            },
            "bluestore_SharedBlob": {
                "items": 13,
                "bytes": 1456
            },
            "bluestore_inline_bl": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_fsck": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_txc": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_writing_deferred": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_writing": {
                "items": 0,
                "bytes": 0
            },
            "bluefs": {
                "items": 421,
                "bytes": 14728
            },
            "bluefs_file_reader": {
                "items": 56,
                "bytes": 5512704
            },
            "bluefs_file_writer": {
                "items": 3,
                "bytes": 672
            },
            "buffer_anon": {
                "items": 16048567,
                "bytes": 65862454806
            },
            "buffer_meta": {
                "items": 1048,
                "bytes": 92224
            },
            "osd": {
                "items": 209,
                "bytes": 2703624
            },
            "osd_mapbl": {
                "items": 0,
                "bytes": 0
            },
            "osd_pglog": {
                "items": 24310250,
                "bytes": 2547517232
            },
            "osdmap": {
                "items": 1578,
                "bytes": 92744
            },
            "osdmap_mapping": {
                "items": 0,
                "bytes": 0
            },
            "pgmap": {
                "items": 0,
                "bytes": 0
            },
            "mds_co": {
                "items": 0,
                "bytes": 0
            },
            "unittest_1": {
                "items": 0,
                "bytes": 0
            },
            "unittest_2": {
                "items": 0,
                "bytes": 0
            }
        },
        "total": {
            "items": 43266343,
            "bytes": 68502297704
        }
    }
}

Any guidance on how to further troubleshoot this issue would be greatly appreciated.


Related issues 1 (0 open1 closed)

Related to RADOS - Bug #53729: ceph-osd takes all memory before oom on bootResolvedNitzan Mordechai

Actions
Actions #1

Updated by Tor Martin Ølberg almost 3 years ago

Tor Martin Ølberg wrote:

After an upgrade to 15.2.13 from 15.2.4 my small home lab cluster ran into issues with OSDs failing on all four hosts. This might be unrelated to the upgrade but it looks like the trigger has been an autoscaling event where the RBD PG pool has been scaled from 128 PGs to 512 PGs. Unfortunately I didnt see that there were pgs being split before I initiated the reboot of one of the hosts to load the latest linux kernel

Only some OSDs are affected, and during the OSD startup the following output can be observed:

2021-07-08T03:57:55.496+0200 7fc7303ff700 10 osd.17 146136 split_pgs splitting pg[5.25( v 146017'38948152 (146011'38947652,146017'38948152] local-lis/les=146012/146013 n=1168 ec=2338/46 lis/c=146012/145792 les/c/f=146013/145793/36878 sis=146019) [17,6] r=0 lpr=146019 pi=[145792,146019)/1 crt=146017'38948152 lcod 0'0 mlcod 0'0 unknown mbc={}] into 5.a5

Exporting/Remove the PGs belonging to poolid(5) seems to resolve the issue of OOMK but yields dataloss (naturally). Pool 5 was the one which were being split.

There isn't a lot of activity in the log (20/20 logging) but everything seems to revolve around splitting PGs. Full OSD startup log attached.

At this point i've exported all the troublesome PGs and gotten all the OSDs online. Trying to import the pgs again causes the OSD to OOMK on startup again.

Attempting to start one of the troubled OSDs with the troubled PG will result in all memory (80GiB) to be exhausted before the OOM killer steps in. Looking at dump of mempools buffer_anon looks severely high? Memory leak?

[...]

Any guidance on how to further troubleshoot this issue would be greatly appreciated.

Updated; I've also tried compiling 16.2.5/Master and 15.2.4 all of them which seems to have the same bug when an OSD is trying to split pgs on boot

Actions #2

Updated by Loïc Dachary over 2 years ago

  • Target version deleted (v15.2.14)
Actions #3

Updated by Igor Dell over 2 years ago

Tor Martin Ølberg wrote:

Tor Martin Ølberg wrote:

After an upgrade to 15.2.13 from 15.2.4 my small home lab cluster ran into issues with OSDs failing on all four hosts. This might be unrelated to the upgrade but it looks like the trigger has been an autoscaling event where the RBD PG pool has been scaled from 128 PGs to 512 PGs. Unfortunately I didnt see that there were pgs being split before I initiated the reboot of one of the hosts to load the latest linux kernel

Only some OSDs are affected, and during the OSD startup the following output can be observed:

2021-07-08T03:57:55.496+0200 7fc7303ff700 10 osd.17 146136 split_pgs splitting pg[5.25( v 146017'38948152 (146011'38947652,146017'38948152] local-lis/les=146012/146013 n=1168 ec=2338/46 lis/c=146012/145792 les/c/f=146013/145793/36878 sis=146019) [17,6] r=0 lpr=146019 pi=[145792,146019)/1 crt=146017'38948152 lcod 0'0 mlcod 0'0 unknown mbc={}] into 5.a5

Exporting/Remove the PGs belonging to poolid(5) seems to resolve the issue of OOMK but yields dataloss (naturally). Pool 5 was the one which were being split.

There isn't a lot of activity in the log (20/20 logging) but everything seems to revolve around splitting PGs. Full OSD startup log attached.

At this point i've exported all the troublesome PGs and gotten all the OSDs online. Trying to import the pgs again causes the OSD to OOMK on startup again.

Attempting to start one of the troubled OSDs with the troubled PG will result in all memory (80GiB) to be exhausted before the OOM killer steps in. Looking at dump of mempools buffer_anon looks severely high? Memory leak?

[...]

Any guidance on how to further troubleshoot this issue would be greatly appreciated.

Updated; I've also tried compiling 16.2.5/Master and 15.2.4 all of them which seems to have the same bug when an OSD is trying to split pgs on boot

Hi Martin,
i'm new on ceph, could you please explain, how you have triggered the split on boot.
Some weeks ago, I had similar issue with OOM. All OSD Pods are been killed by OOM on the weekend in my kubernetes ab.
I think my issue was, give not enough memory to osd pod that has trigger the first OOM at wrong time.
I try to reconstruct the issue.

Actions #4

Updated by Igor Fedotov over 2 years ago

  • Related to Bug #53729: ceph-osd takes all memory before oom on boot added
Actions #5

Updated by Neha Ojha over 2 years ago

Tor Martin Ølberg wrote:

Tor Martin Ølberg wrote:

After an upgrade to 15.2.13 from 15.2.4 my small home lab cluster ran into issues with OSDs failing on all four hosts. This might be unrelated to the upgrade but it looks like the trigger has been an autoscaling event where the RBD PG pool has been scaled from 128 PGs to 512 PGs. Unfortunately I didnt see that there were pgs being split before I initiated the reboot of one of the hosts to load the latest linux kernel

Only some OSDs are affected, and during the OSD startup the following output can be observed:

2021-07-08T03:57:55.496+0200 7fc7303ff700 10 osd.17 146136 split_pgs splitting pg[5.25( v 146017'38948152 (146011'38947652,146017'38948152] local-lis/les=146012/146013 n=1168 ec=2338/46 lis/c=146012/145792 les/c/f=146013/145793/36878 sis=146019) [17,6] r=0 lpr=146019 pi=[145792,146019)/1 crt=146017'38948152 lcod 0'0 mlcod 0'0 unknown mbc={}] into 5.a5

Exporting/Remove the PGs belonging to poolid(5) seems to resolve the issue of OOMK but yields dataloss (naturally). Pool 5 was the one which were being split.

There isn't a lot of activity in the log (20/20 logging) but everything seems to revolve around splitting PGs. Full OSD startup log attached.

At this point i've exported all the troublesome PGs and gotten all the OSDs online. Trying to import the pgs again causes the OSD to OOMK on startup again.

Attempting to start one of the troubled OSDs with the troubled PG will result in all memory (80GiB) to be exhausted before the OOM killer steps in. Looking at dump of mempools buffer_anon looks severely high? Memory leak?

[...]

Any guidance on how to further troubleshoot this issue would be greatly appreciated.

Updated; I've also tried compiling 16.2.5/Master and 15.2.4 all of them which seems to have the same bug when an OSD is trying to split pgs on boot

Can you provide osd logs with debug_osd=20,debug_ms=1 before the OSD crashes? Also the output "ceph pg dump" and "ceph daemon osd.x dump_mempools" will be very useful.

Actions #6

Updated by Neha Ojha over 2 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (OSD)
Actions

Also available in: Atom PDF