Project

General

Profile

Actions

Bug #23462

closed

Out of memory on Bluestore

Added by Alex Gorbachev about 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Routine under some load:

Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966156] tp_peering invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null), order=0, oom_score_adj=0
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966160] tp_peering cpuset=/ mems_allowed=0-1
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966167] CPU: 18 PID: 57546 Comm: tp_peering Not tainted 4.14.14-041414-generic #201801201219
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966168] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966170] Call Trace:
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966178] dump_stack+0x5c/0x85
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966181] dump_header+0x94/0x229
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966184] ? do_try_to_free_pages+0x2a1/0x330
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966187] ? get_page_from_freelist+0xa3/0xb20
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966189] oom_kill_process+0x213/0x410
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966191] out_of_memory+0x2af/0x4d0
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966193] __alloc_pages_slowpath+0xab2/0xe40
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966196] __alloc_pages_nodemask+0x261/0x280
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966200] filemap_fault+0x33f/0x6b0
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966202] ? filemap_map_pages+0x37e/0x3a0
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966206] ext4_filemap_fault+0x2c/0x40
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966209] __do_fault+0x19/0xe0
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966211] __handle_mm_fault+0xcd6/0x1180
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966213] handle_mm_fault+0xaa/0x1f0
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966217] __do_page_fault+0x25d/0x4e0
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966221] ? page_fault+0x36/0x60
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966223] page_fault+0x4c/0x60
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966226] RIP: 0033:0x55afd699db80
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966227] RSP: 002b:00007f06dc9c0488 EFLAGS: 00010202
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966229] Mem-Info:
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966236] active_anon:30857651 inactive_anon:1302742 isolated_anon:0
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966236] active_file:664 inactive_file:794 isolated_file:0
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966236] unevictable:3547 dirty:0 writeback:0 unstable:0
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966236] slab_reclaimable:75629 slab_unreclaimable:151539
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966236] mapped:14452 shmem:14713 pagetables:77721 bounce:0
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966236] free:328404 free_pcp:0 free_cma:0
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966240] Node 0 active_anon:61407900kB inactive_anon:2591884kB active_file:1348kB inactive_file:2608kB unevictable:6852kB isolated(anon):0kB isolated(file):0kB mapped:8120kB dirty:0kB writeback:0kB shmem:600kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966243] Node 1 active_anon:62022704kB inactive_anon:2619084kB active_file:1308kB inactive_file:568kB unevictable:7336kB isolated(anon):0kB isolated(file):0kB mapped:49688kB dirty:0kB writeback:0kB shmem:58252kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966245] Node 0 DMA free:15896kB min:124kB low:152kB high:180kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15980kB managed:15896kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966249] lowmem_reserve[]: 0 1889 64320 64320 64320
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966253] Node 0 DMA32 free:265272kB min:15732kB low:19664kB high:23596kB active_anon:1686676kB inactive_anon:10564kB active_file:356kB inactive_file:260kB unevictable:0kB writepending:0kB present:2046156kB managed:1980588kB mlocked:0kB kernel_stack:0kB pagetables:1624kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Mar 26 10:06:29 roc04r-sc3a090 kernel: [820832.966258] lowmem_reserve[]: 0 0 62430 62430 6243

OSD log:

2018-03-26 09:51:15.372711 7f2d8d26a700 0 -- 10.80.4.90:6814/43323 >> 10.80.4.80:6808/24378 conn(0x56087a71d000 :6814 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 70 vs existing csq=69 existing_state=STATE_STANDBY
2018-03-26 10:06:51.791494 7f10af132e00 0 set uid:gid to 64045:64045 (ceph:ceph)
2018-03-26 10:06:51.791514 7f10af132e00 0 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process (unknown), pid 1893155
2018-03-26 10:06:51.812003 7f10af132e00 0 pidfile_write: ignore empty --pid-file
2018-03-26 10:06:51.841927 7f10af132e00 0 load: jerasure load: lrc load: isa

So looks like just a restart. Here is the bluestore config lines:

debug_bluestore = 1/5
bluestore_2q_cache_kin_ratio = 0.500000
bluestore_2q_cache_kout_ratio = 0.500000
bluestore_allocator = stupid
bluestore_bitmapallocator_blocks_per_zone = 1024
bluestore_bitmapallocator_span_size = 1024
bluestore_blobid_prealloc = 10240
bluestore_block_create = true
bluestore_block_db_create = false
bluestore_block_db_path =
bluestore_block_db_size = 0
bluestore_block_path =
bluestore_block_preallocate_file = false
bluestore_block_size = 10737418240
bluestore_block_wal_create = false
bluestore_block_wal_path =
bluestore_block_wal_size = 100663296
bluestore_bluefs = true
bluestore_bluefs_balance_interval = 1.000000
bluestore_bluefs_env_mirror = false
bluestore_bluefs_gift_ratio = 0.020000
bluestore_bluefs_max_ratio = 0.900000
bluestore_bluefs_min = 1073741824
bluestore_bluefs_min_free = 1073741824
bluestore_bluefs_min_ratio = 0.020000
bluestore_bluefs_reclaim_ratio = 0.200000
bluestore_cache_kv_max = 536870912
bluestore_cache_kv_ratio = 0.990000
bluestore_cache_meta_ratio = 0.010000
bluestore_cache_size = 0
bluestore_cache_size_hdd = 5368709120
bluestore_cache_size_ssd = 5368709120
bluestore_cache_trim_interval = 0.200000
bluestore_cache_trim_max_skip_pinned = 64
bluestore_cache_type = 2q
bluestore_clone_cow = true
bluestore_compression_algorithm = snappy
bluestore_compression_max_blob_size = 0
bluestore_compression_max_blob_size_hdd = 524288
bluestore_compression_max_blob_size_ssd = 65536
bluestore_compression_min_blob_size = 0
bluestore_compression_min_blob_size_hdd = 131072
bluestore_compression_min_blob_size_ssd = 8192
bluestore_compression_mode = none
bluestore_compression_required_ratio = 0.875000
bluestore_csum_max_block = 65536
bluestore_csum_min_block = 4096
bluestore_csum_type = crc32c
bluestore_debug_freelist = false
bluestore_debug_fsck_abort = false
bluestore_debug_inject_read_err = false
bluestore_debug_misc = false
bluestore_debug_no_reuse_blocks = false
bluestore_debug_omit_block_device_write = false
bluestore_debug_omit_kv_commit = false
bluestore_debug_permit_any_bdev_label = false
bluestore_debug_prefill = 0.000000
bluestore_debug_prefragment_max = 1048576
bluestore_debug_random_read_err = 0.000000
bluestore_debug_randomize_serial_transaction = 0
bluestore_debug_small_allocations = 0
bluestore_default_buffered_read = true
bluestore_default_buffered_write = false
bluestore_deferred_batch_ops = 0
bluestore_deferred_batch_ops_hdd = 64
bluestore_deferred_batch_ops_ssd = 16
bluestore_extent_map_inline_shard_prealloc_size = 256
bluestore_extent_map_shard_max_size = 1200
bluestore_extent_map_shard_min_size = 150
bluestore_extent_map_shard_target_size = 500
bluestore_extent_map_shard_target_size_slop = 0.200000
bluestore_freelist_blocks_per_key = 128
bluestore_fsck_on_mkfs = true
bluestore_fsck_on_mkfs_deep = false
bluestore_fsck_on_mount = false
bluestore_fsck_on_mount_deep = true
bluestore_fsck_on_umount = false
bluestore_fsck_on_umount_deep = true
bluestore_gc_enable_blob_threshold = 0
bluestore_gc_enable_total_threshold = 0
bluestore_kvbackend = rocksdb
bluestore_max_alloc_size = 0
bluestore_max_blob_size = 0
bluestore_max_blob_size_hdd = 524288
bluestore_max_blob_size_ssd = 65536
bluestore_max_deferred_txc = 32
bluestore_min_alloc_size = 0
bluestore_min_alloc_size_hdd = 65536
bluestore_min_alloc_size_ssd = 16384
bluestore_nid_prealloc = 1024
bluestore_prefer_deferred_size = 0
bluestore_prefer_deferred_size_hdd = 32768
bluestore_prefer_deferred_size_ssd = 0
bluestore_rocksdb_options = compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152
bluestore_shard_finishers = false
bluestore_spdk_coremask = 0x3
bluestore_spdk_max_io_completion = 0
bluestore_spdk_mem = 512
bluestore_sync_submit_transaction = false
bluestore_throttle_bytes = 67108864
bluestore_throttle_cost_per_io = 0
bluestore_throttle_cost_per_io_hdd = 670000
bluestore_throttle_cost_per_io_ssd = 4000
bluestore_throttle_deferred_bytes = 134217728
mon_debug_no_require_bluestore_for_ec_overwrites = false
root@roc04r-sc3a090:/var/log/ceph#

Actions #1

Updated by Wido den Hollander about 6 years ago

How much memory in this machine (64GB?) and how many OSDs were/are running on this node?

Actions #2

Updated by Igor Fedotov about 6 years ago

Wido den Hollander wrote:

How much memory in this machine (64GB?) and how many OSDs were/are running on this node?

From Alex's message at ceph-users there is 18 OSDs and 128Gb RAM per single host.
I advised to reduce cache size settings and check the results..

Actions #3

Updated by Alex Gorbachev about 6 years ago

No more issues after reducing cache to account for overall memory size on the OSD machine - thank you

Actions #4

Updated by Igor Fedotov about 6 years ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF