Bug #42529
closedmemory bloat + OSD process crash
0%
Description
Seeing OSD processes using up to 30G Ram. 7.2k 10TB HDDs. Affects multiple OSDs on multiple hosts. (related http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-October/037283.html )
{ "crash_id": "2019-10-29_12:49:25.934834Z_b135d2ee-bb46-4d36-accb-a754da960fbe", "timestamp": "2019-10-29 12:49:25.934834Z", "process_name": "ceph-osd", "entity_name": "osd.52", "ceph_version": "14.2.4", "utsname_hostname": "fsn-1-dc4-s3-1066329", "utsname_sysname": "Linux", "utsname_release": "5.0.0-20-generic", "utsname_version": "#21~18.04.1-Ubuntu SMP Thu Jun 27 04:04:37 UTC 2019", "utsname_machine": "x86_64", "os_name": "Ubuntu", "os_id": "ubuntu", "os_version_id": "18.04", "os_version": "18.04.3 LTS (Bionic Beaver)", "backtrace": [ "(()+0x12890) [0x7fba1d66f890]", "(gsignal()+0xc7) [0x7fba1c321e97]", "(abort()+0x141) [0x7fba1c323801]", "(()+0x8c957) [0x7fba1cd16957]", "(()+0x92ab6) [0x7fba1cd1cab6]", "(()+0x92af1) [0x7fba1cd1caf1]", "(()+0x92d24) [0x7fba1cd1cd24]", "(ceph::buffer::v14_2_0::create_aligned_in_mempool(unsigned int, unsigned int, int)+0x229) [0x560fbf97a689]", "(ceph::buffer::v14_2_0::create_aligned(unsigned int, unsigned int)+0x22) [0x560fbf97a772]", "(ceph::buffer::v14_2_0::create_small_page_aligned(unsigned int)+0x55) [0x560fbf97b0b5]", "(ProtocolV1::read_message_data_prepare()+0x340) [0x560fbfb165d0]", "(ProtocolV1::read_message_middle()+0x128) [0x560fbfb16748]", "(ProtocolV1::handle_message_front(char*, int)+0x202) [0x560fbfb16f32]", "(()+0xf3db2d) [0x560fbfb10b2d]", "(AsyncConnection::process()+0x18c) [0x560fbfb0db6c]", "(EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xa1d) [0x560fbf96625d]", "(()+0xd96e5b) [0x560fbf969e5b]", "(()+0xbd66f) [0x7fba1cd4766f]", "(()+0x76db) [0x7fba1d6646db]", "(clone()+0x3f) [0x7fba1c40488f]" ] }
Happens a lot:
drwx------ 2 ceph ceph 4.0K Oct 29 00:42 2019-10-28_23:42:37.533663Z_9a59660a-1662-4b7e-953c-fd784eb2ecf9
drwx------ 2 ceph ceph 4.0K Oct 29 01:10 2019-10-29_00:10:19.378797Z_82c05bb0-8eff-4321-83b6-5ee07f8b1f79
drwx------ 2 ceph ceph 4.0K Oct 29 01:21 2019-10-29_00:21:16.749525Z_732e106f-a70e-4193-a034-103a6856352c
drwx------ 2 ceph ceph 4.0K Oct 29 03:02 2019-10-29_02:02:44.901839Z_46f8d907-86d7-46eb-aa7b-a936038d3d8c
drwx------ 2 ceph ceph 4.0K Oct 29 03:02 2019-10-29_02:02:44.901843Z_bcae7658-8f9e-4c9b-90af-ae03a3226f89
drwx------ 2 ceph ceph 4.0K Oct 29 03:02 2019-10-29_02:02:44.901882Z_010b47f7-285b-4ea9-a364-9cfa16ac0cf2
drwx------ 2 ceph ceph 4.0K Oct 29 03:02 2019-10-29_02:02:44.901882Z_c6089117-6f70-4a1b-b411-9bee0fe394c0
drwx------ 2 ceph ceph 4.0K Oct 29 03:02 2019-10-29_02:02:44.909748Z_22ea7f49-f660-412c-824e-9e6867dcdeee
drwx------ 2 ceph ceph 4.0K Oct 29 03:02 2019-10-29_02:02:45.009118Z_dd752132-e666-4efc-b9ec-38a707a40d4a
drwx------ 2 ceph ceph 4.0K Oct 29 07:19 2019-10-29_06:19:58.498589Z_5d1f09dd-bfe2-41b9-be7d-93337b1bc085
drwx------ 2 ceph ceph 4.0K Oct 29 08:14 2019-10-29_07:14:10.337609Z_fa14fef1-cc60-48ee-a0ec-19c37468f4c6
drwx------ 2 ceph ceph 4.0K Oct 29 10:01 2019-10-29_09:01:27.743867Z_aae2aa39-bb6c-4050-a3c8-a80e896eadb3
drwx------ 2 ceph ceph 4.0K Oct 29 11:12 2019-10-29_10:12:03.466164Z_ba2b9d38-0269-4fc8-a5d9-808ef8504abb
drwx------ 2 ceph ceph 4.0K Oct 29 11:12 2019-10-29_10:12:03.504084Z_b7b07e19-9feb-4dc2-9ca3-281a4058a380
drwx------ 2 ceph ceph 4.0K Oct 29 12:25 2019-10-29_11:25:12.500891Z_8bf7d2d0-aafc-47cb-b445-011c43da93e3
drwx------ 2 ceph ceph 4.0K Oct 29 12:25 2019-10-29_11:25:12.501042Z_b4886cf1-d1b9-4205-a561-6ef79f002b37
drwx------ 2 ceph ceph 4.0K Oct 29 13:11 2019-10-29_12:11:36.695912Z_da228dc0-9705-4438-b7bd-efb053fb4f10
drwx------ 2 ceph ceph 4.0K Oct 29 13:13 2019-10-29_12:13:39.123536Z_57a4c800-b6d6-4c5d-ba9e-4e975614cd7d
drwx------ 2 ceph ceph 4.0K Oct 29 13:18 2019-10-29_12:18:13.181378Z_cbd2bc65-f3d6-4bc9-bee3-f654f0945782
drwx------ 2 ceph ceph 4.0K Oct 29 13:18 2019-10-29_12:18:13.226038Z_70fbc58d-eafa-4f3b-b6d1-91f71db12a94
drwx------ 2 ceph ceph 4.0K Oct 29 13:18 2019-10-29_12:18:13.289436Z_e58964ab-7ed8-4f5a-bfdf-ff1a3b180288
drwx------ 2 ceph ceph 4.0K Oct 29 13:49 2019-10-29_12:49:25.934834Z_b135d2ee-bb46-4d36-accb-a754da960fbe
Updated by Anonymous over 4 years ago
some mempool info of an affected OSD;
{ "mempool": { "by_pool": { "bloom_filter": { "items": 0, "bytes": 0 }, "bluestore_alloc": { "items": 2545349, "bytes": 20362792 }, "bluestore_cache_data": { "items": 28759, "bytes": 6972870656 }, "bluestore_cache_onode": { "items": 2885255, "bytes": 1892727280 }, "bluestore_cache_other": { "items": 202831651, "bytes": 5403585971 }, "bluestore_fsck": { "items": 0, "bytes": 0 }, "bluestore_txc": { "items": 21, "bytes": 15792 }, "bluestore_writing_deferred": { "items": 77, "bytes": 7803168 }, "bluestore_writing": { "items": 4, "bytes": 5319827 }, "bluefs": { "items": 5242, "bytes": 175096 }, "buffer_anon": { "items": 726644, "bytes": 193214370 }, "buffer_meta": { "items": 754360, "bytes": 66383680 }, "osd": { "items": 29, "bytes": 377464 }, "osd_mapbl": { "items": 50, "bytes": 3492082 }, "osd_pglog": { "items": 99011, "bytes": 46170592 }, "osdmap": { "items": 48130, "bytes": 1151208 }, "osdmap_mapping": { "items": 0, "bytes": 0 }, "pgmap": { "items": 0, "bytes": 0 }, "mds_co": { "items": 0, "bytes": 0 }, "unittest_1": { "items": 0, "bytes": 0 }, "unittest_2": { "items": 0, "bytes": 0 } }, "total": { "items": 209924582, "bytes": 14613649978 } } }