Project

General

Profile

Actions

Bug #42529

closed

memory bloat + OSD process crash

Added by Anonymous over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Seeing OSD processes using up to 30G Ram. 7.2k 10TB HDDs. Affects multiple OSDs on multiple hosts. (related http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-October/037283.html )

{
    "crash_id": "2019-10-29_12:49:25.934834Z_b135d2ee-bb46-4d36-accb-a754da960fbe",
    "timestamp": "2019-10-29 12:49:25.934834Z",
    "process_name": "ceph-osd",
    "entity_name": "osd.52",
    "ceph_version": "14.2.4",
    "utsname_hostname": "fsn-1-dc4-s3-1066329",
    "utsname_sysname": "Linux",
    "utsname_release": "5.0.0-20-generic",
    "utsname_version": "#21~18.04.1-Ubuntu SMP Thu Jun 27 04:04:37 UTC 2019",
    "utsname_machine": "x86_64",
    "os_name": "Ubuntu",
    "os_id": "ubuntu",
    "os_version_id": "18.04",
    "os_version": "18.04.3 LTS (Bionic Beaver)",
    "backtrace": [
        "(()+0x12890) [0x7fba1d66f890]",
        "(gsignal()+0xc7) [0x7fba1c321e97]",
        "(abort()+0x141) [0x7fba1c323801]",
        "(()+0x8c957) [0x7fba1cd16957]",
        "(()+0x92ab6) [0x7fba1cd1cab6]",
        "(()+0x92af1) [0x7fba1cd1caf1]",
        "(()+0x92d24) [0x7fba1cd1cd24]",
        "(ceph::buffer::v14_2_0::create_aligned_in_mempool(unsigned int, unsigned int, int)+0x229) [0x560fbf97a689]",
        "(ceph::buffer::v14_2_0::create_aligned(unsigned int, unsigned int)+0x22) [0x560fbf97a772]",
        "(ceph::buffer::v14_2_0::create_small_page_aligned(unsigned int)+0x55) [0x560fbf97b0b5]",
        "(ProtocolV1::read_message_data_prepare()+0x340) [0x560fbfb165d0]",
        "(ProtocolV1::read_message_middle()+0x128) [0x560fbfb16748]",
        "(ProtocolV1::handle_message_front(char*, int)+0x202) [0x560fbfb16f32]",
        "(()+0xf3db2d) [0x560fbfb10b2d]",
        "(AsyncConnection::process()+0x18c) [0x560fbfb0db6c]",
        "(EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xa1d) [0x560fbf96625d]",
        "(()+0xd96e5b) [0x560fbf969e5b]",
        "(()+0xbd66f) [0x7fba1cd4766f]",
        "(()+0x76db) [0x7fba1d6646db]",
        "(clone()+0x3f) [0x7fba1c40488f]" 
    ]
}

Happens a lot:
drwx------ 2 ceph ceph 4.0K Oct 29 00:42 2019-10-28_23:42:37.533663Z_9a59660a-1662-4b7e-953c-fd784eb2ecf9
drwx------ 2 ceph ceph 4.0K Oct 29 01:10 2019-10-29_00:10:19.378797Z_82c05bb0-8eff-4321-83b6-5ee07f8b1f79
drwx------ 2 ceph ceph 4.0K Oct 29 01:21 2019-10-29_00:21:16.749525Z_732e106f-a70e-4193-a034-103a6856352c
drwx------ 2 ceph ceph 4.0K Oct 29 03:02 2019-10-29_02:02:44.901839Z_46f8d907-86d7-46eb-aa7b-a936038d3d8c
drwx------ 2 ceph ceph 4.0K Oct 29 03:02 2019-10-29_02:02:44.901843Z_bcae7658-8f9e-4c9b-90af-ae03a3226f89
drwx------ 2 ceph ceph 4.0K Oct 29 03:02 2019-10-29_02:02:44.901882Z_010b47f7-285b-4ea9-a364-9cfa16ac0cf2
drwx------ 2 ceph ceph 4.0K Oct 29 03:02 2019-10-29_02:02:44.901882Z_c6089117-6f70-4a1b-b411-9bee0fe394c0
drwx------ 2 ceph ceph 4.0K Oct 29 03:02 2019-10-29_02:02:44.909748Z_22ea7f49-f660-412c-824e-9e6867dcdeee
drwx------ 2 ceph ceph 4.0K Oct 29 03:02 2019-10-29_02:02:45.009118Z_dd752132-e666-4efc-b9ec-38a707a40d4a
drwx------ 2 ceph ceph 4.0K Oct 29 07:19 2019-10-29_06:19:58.498589Z_5d1f09dd-bfe2-41b9-be7d-93337b1bc085
drwx------ 2 ceph ceph 4.0K Oct 29 08:14 2019-10-29_07:14:10.337609Z_fa14fef1-cc60-48ee-a0ec-19c37468f4c6
drwx------ 2 ceph ceph 4.0K Oct 29 10:01 2019-10-29_09:01:27.743867Z_aae2aa39-bb6c-4050-a3c8-a80e896eadb3
drwx------ 2 ceph ceph 4.0K Oct 29 11:12 2019-10-29_10:12:03.466164Z_ba2b9d38-0269-4fc8-a5d9-808ef8504abb
drwx------ 2 ceph ceph 4.0K Oct 29 11:12 2019-10-29_10:12:03.504084Z_b7b07e19-9feb-4dc2-9ca3-281a4058a380
drwx------ 2 ceph ceph 4.0K Oct 29 12:25 2019-10-29_11:25:12.500891Z_8bf7d2d0-aafc-47cb-b445-011c43da93e3
drwx------ 2 ceph ceph 4.0K Oct 29 12:25 2019-10-29_11:25:12.501042Z_b4886cf1-d1b9-4205-a561-6ef79f002b37
drwx------ 2 ceph ceph 4.0K Oct 29 13:11 2019-10-29_12:11:36.695912Z_da228dc0-9705-4438-b7bd-efb053fb4f10
drwx------ 2 ceph ceph 4.0K Oct 29 13:13 2019-10-29_12:13:39.123536Z_57a4c800-b6d6-4c5d-ba9e-4e975614cd7d
drwx------ 2 ceph ceph 4.0K Oct 29 13:18 2019-10-29_12:18:13.181378Z_cbd2bc65-f3d6-4bc9-bee3-f654f0945782
drwx------ 2 ceph ceph 4.0K Oct 29 13:18 2019-10-29_12:18:13.226038Z_70fbc58d-eafa-4f3b-b6d1-91f71db12a94
drwx------ 2 ceph ceph 4.0K Oct 29 13:18 2019-10-29_12:18:13.289436Z_e58964ab-7ed8-4f5a-bfdf-ff1a3b180288
drwx------ 2 ceph ceph 4.0K Oct 29 13:49 2019-10-29_12:49:25.934834Z_b135d2ee-bb46-4d36-accb-a754da960fbe

Actions #1

Updated by Anonymous over 4 years ago

some mempool info of an affected OSD;

{
    "mempool": {
        "by_pool": {
            "bloom_filter": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_alloc": {
                "items": 2545349,
                "bytes": 20362792
            },
            "bluestore_cache_data": {
                "items": 28759,
                "bytes": 6972870656
            },
            "bluestore_cache_onode": {
                "items": 2885255,
                "bytes": 1892727280
            },
            "bluestore_cache_other": {
                "items": 202831651,
                "bytes": 5403585971
            },
            "bluestore_fsck": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_txc": {
                "items": 21,
                "bytes": 15792
            },
            "bluestore_writing_deferred": {
                "items": 77,
                "bytes": 7803168
            },
            "bluestore_writing": {
                "items": 4,
                "bytes": 5319827
            },
            "bluefs": {
                "items": 5242,
                "bytes": 175096
            },
            "buffer_anon": {
                "items": 726644,
                "bytes": 193214370
            },
            "buffer_meta": {
                "items": 754360,
                "bytes": 66383680
            },
            "osd": {
                "items": 29,
                "bytes": 377464
            },
            "osd_mapbl": {
                "items": 50,
                "bytes": 3492082
            },
            "osd_pglog": {
                "items": 99011,
                "bytes": 46170592
            },
            "osdmap": {
                "items": 48130,
                "bytes": 1151208
            },
            "osdmap_mapping": {
                "items": 0,
                "bytes": 0
            },
            "pgmap": {
                "items": 0,
                "bytes": 0
            },
            "mds_co": {
                "items": 0,
                "bytes": 0
            },
            "unittest_1": {
                "items": 0,
                "bytes": 0
            },
            "unittest_2": {
                "items": 0,
                "bytes": 0
            }
        },
        "total": {
            "items": 209924582,
            "bytes": 14613649978
        }
    }
}
Actions #2

Updated by Anonymous over 4 years ago

Close.
Cause: wrong memory target setting

Actions #3

Updated by Greg Farnum over 4 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF