Project

General

Profile

Bug #44509

using ceph-bluestore-tool bluefs-bdev-new-db results in 128k or 64k leftover on slow device

Added by Stefan Priebe over 1 year ago. Updated over 1 year ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Atfer using bluefs-bdev-new-db bluefs-bdev-new-db on 24 osds. All of those OSDs are in the following state even after one week:

osd.0 spilled over 64 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.1 spilled over 64 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.2 spilled over 64 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.3 spilled over 64 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.4 spilled over 64 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.5 spilled over 64 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.6 spilled over 64 KiB metadata from 'db' device (13 GiB used of 185 GiB) to slow device
osd.7 spilled over 64 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.8 spilled over 128 KiB metadata from 'db' device (13 GiB used of 185 GiB) to slow device
osd.9 spilled over 128 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.10 spilled over 128 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.11 spilled over 128 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.12 spilled over 64 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.13 spilled over 128 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.14 spilled over 128 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.15 spilled over 128 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.16 spilled over 128 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.17 spilled over 128 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.18 spilled over 128 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.19 spilled over 128 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.20 spilled over 128 KiB metadata from 'db' device (13 GiB used of 185 GiB) to slow device
osd.21 spilled over 128 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.22 spilled over 128 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device
osd.23 spilled over 128 KiB metadata from 'db' device (12 GiB used of 185 GiB) to slow device

I tried manually compat or restart of the osd but nothing helps.

History

#1 Updated by Igor Fedotov over 1 year ago

Could you please provide an output for the following command (OSD to be offline):
ceph-kvstore-tool bluestore-kv <path-to-osd> stats

#2 Updated by Stefan Priebe over 1 year ago

sure:
```
  1. ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-5 stats
    2020-04-21 19:51:16.635 7f439d4b7e00 1 bluestore(/var/lib/ceph/osd/ceph-5) _mount path /var/lib/ceph/osd/ceph-5
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bdev create path /var/lib/ceph/osd/ceph-5/block type kernel
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bdev(0x55930660e700 /var/lib/ceph/osd/ceph-5/block) open path /var/lib/ceph/osd/ceph-5/block
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bdev(0x55930660e700 /var/lib/ceph/osd/ceph-5/block) open size 11999064883200 (0xae9c0000000, 11 TiB) block_size 4096 (4 KiB) rotational discard not supported
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bluestore(/var/lib/ceph/osd/ceph-5) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bdev create path /var/lib/ceph/osd/ceph-5/block.db type kernel
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bdev(0x55930660ee00 /var/lib/ceph/osd/ceph-5/block.db) open path /var/lib/ceph/osd/ceph-5/block.db
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bdev(0x55930660ee00 /var/lib/ceph/osd/ceph-5/block.db) open size 198642237440 (0x2e40000000, 185 GiB) block_size 4096 (4 KiB) non-rotational discard supported
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-5/block.db size 185 GiB
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bdev create path /var/lib/ceph/osd/ceph-5/block type kernel
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bdev(0x55930660f180 /var/lib/ceph/osd/ceph-5/block) open path /var/lib/ceph/osd/ceph-5/block
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bdev(0x55930660f180 /var/lib/ceph/osd/ceph-5/block) open size 11999064883200 (0xae9c0000000, 11 TiB) block_size 4096 (4 KiB) rotational discard not supported
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-5/block size 11 TiB
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bluefs mount
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bluefs _init_alloc id 1 alloc_size 0x100000 size 0x2e40000000
    2020-04-21 19:51:16.651 7f439d4b7e00 1 bluefs _init_alloc id 2 alloc_size 0x10000 size 0xae9c0000000
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option compaction_readahead_size = 2097152
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option compression = kNoCompression
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option max_background_compactions = 2
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option max_write_buffer_number = 4
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option min_write_buffer_number_to_merge = 1
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option recycle_log_file_num = 4
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option writable_file_max_buffer_size = 0
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option write_buffer_size = 268435456
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option compaction_readahead_size = 2097152
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option compression = kNoCompression
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option max_background_compactions = 2
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option max_write_buffer_number = 4
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option min_write_buffer_number_to_merge = 1
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option recycle_log_file_num = 4
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option writable_file_max_buffer_size = 0
    2020-04-21 19:51:16.679 7f439d4b7e00 0 set rocksdb option write_buffer_size = 268435456
    2020-04-21 19:51:16.735 7f439d4b7e00 1 rocksdb: do_open column families: [default]
    2020-04-21 19:51:18.271 7f439d4b7e00 1 bluestore(/var/lib/ceph/osd/ceph-5) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2
    2020-04-21 19:51:18.287 7f439d4b7e00 1 freelist init
    2020-04-21 19:51:18.303 7f439d4b7e00 1 bluestore(/var/lib/ceph/osd/ceph-5) _open_alloc opening allocation metadata
    2020-04-21 19:51:19.495 7f439d4b7e00 1 bluestore(/var/lib/ceph/osd/ceph-5) _open_alloc loaded 8.0 TiB in 1134451 extents
    2020-04-21 19:51:19.499 7f439d4b7e00 1 bluefs umount
    2020-04-21 19:51:19.499 7f439d4b7e00 1 fbmap_alloc 0x55930598ea00 shutdown
    2020-04-21 19:51:19.499 7f439d4b7e00 1 fbmap_alloc 0x55930598eb00 shutdown
    2020-04-21 19:51:19.499 7f439d4b7e00 1 bdev(0x55930660ee00 /var/lib/ceph/osd/ceph-5/block.db) close
    2020-04-21 19:51:19.731 7f439d4b7e00 1 bdev(0x55930660f180 /var/lib/ceph/osd/ceph-5/block) close
    2020-04-21 19:51:19.951 7f439d4b7e00 1 bdev create path /var/lib/ceph/osd/ceph-5/block.db type kernel
    2020-04-21 19:51:19.951 7f439d4b7e00 1 bdev(0x55930660f180 /var/lib/ceph/osd/ceph-5/block.db) open path /var/lib/ceph/osd/ceph-5/block.db
    2020-04-21 19:51:19.951 7f439d4b7e00 1 bdev(0x55930660f180 /var/lib/ceph/osd/ceph-5/block.db) open size 198642237440 (0x2e40000000, 185 GiB) block_size 4096 (4 KiB) non-rotational discard supported
    2020-04-21 19:51:19.951 7f439d4b7e00 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-5/block.db size 185 GiB
    2020-04-21 19:51:19.951 7f439d4b7e00 1 bdev create path /var/lib/ceph/osd/ceph-5/block type kernel
    2020-04-21 19:51:19.951 7f439d4b7e00 1 bdev(0x55930660ee00 /var/lib/ceph/osd/ceph-5/block) open path /var/lib/ceph/osd/ceph-5/block
    2020-04-21 19:51:19.951 7f439d4b7e00 1 bdev(0x55930660ee00 /var/lib/ceph/osd/ceph-5/block) open size 11999064883200 (0xae9c0000000, 11 TiB) block_size 4096 (4 KiB) rotational discard not supported
    2020-04-21 19:51:19.951 7f439d4b7e00 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-5/block size 11 TiB
    2020-04-21 19:51:19.951 7f439d4b7e00 1 bluefs mount
    2020-04-21 19:51:19.951 7f439d4b7e00 1 bluefs _init_alloc id 1 alloc_size 0x100000 size 0x2e40000000
    2020-04-21 19:51:19.951 7f439d4b7e00 1 bluefs _init_alloc id 2 alloc_size 0x10000 size 0xae9c0000000
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option compaction_readahead_size = 2097152
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option compression = kNoCompression
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option max_background_compactions = 2
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option max_write_buffer_number = 4
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option min_write_buffer_number_to_merge = 1
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option recycle_log_file_num = 4
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option writable_file_max_buffer_size = 0
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option write_buffer_size = 268435456
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option compaction_readahead_size = 2097152
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option compression = kNoCompression
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option max_background_compactions = 2
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option max_write_buffer_number = 4
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option min_write_buffer_number_to_merge = 1
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option recycle_log_file_num = 4
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option writable_file_max_buffer_size = 0
    2020-04-21 19:51:19.971 7f439d4b7e00 0 set rocksdb option write_buffer_size = 268435456
    2020-04-21 19:51:20.031 7f439d4b7e00 1 rocksdb: do_open column families: [default]
    2020-04-21 19:51:21.599 7f439d4b7e00 1 bluestore(/var/lib/ceph/osd/ceph-5) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2
    db_statistics {
    "rocksdb_compaction_statistics": "",
    "": "",
    "": "** Compaction Stats [default] ",
    "": "Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop",
    "": "----------------------------------------------------------------------------------------------------------------------------------------------------------------------------",
    "": " L0 1/0 5.06 MB 0.2 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 149.5 0.03 0.00 1 0.034 0 0",
    "": " L1 3/0 119.66 MB 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0",
    "": " L2 29/0 1.70 GB 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0",
    "": " L3 137/0 7.78 GB 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0",
    "": " Sum 170/0 9.60 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 149.5 0.03 0.00 1 0.034 0 0",
    "": " Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0",
    "": "",
    "": "
    Compaction Stats [default] ",
    "": "Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop",
    "": "-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------",
    "": "User 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 149.5 0.03 0.00 1 0.034 0 0",
    "": "Uptime(secs): 1.6 total, 0.0 interval",
    "": "Flush(GB): cumulative 0.005, interval 0.000",
    "": "AddFile(GB): cumulative 0.000, interval 0.000",
    "": "AddFile(Total Files): cumulative 0, interval 0",
    "": "AddFile(L0 Files): cumulative 0, interval 0",
    "": "AddFile(Keys): cumulative 0, interval 0",
    "": "Cumulative compaction: 0.00 GB write, 3.23 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds",
    "": "Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds",
    "": "Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count",
    "": "",
    "": "
    File Read Latency Histogram By Level [default] ",
    "": "
    Level 0 read latency histogram (micros):",
    "": "Count: 5 Average: 2.8000 StdDev: 2.32",
    "": "Min: 0 Median: 1.5000 Max: 6",
    "": "Percentiles: P50: 1.50 P75: 4.75 P99: 5.95 P99.9: 6.00 P99.99: 6.00",
    "": "------------------------------------------------------",
    "": "[ 0, 1 ] 2 40.000% 40.000% ########",
    "": "( 1, 2 ] 1 20.000% 60.000% ####",
    "": "( 4, 6 ] 2 40.000% 100.000% ########",
    "": "",
    "": "** Level 1 read latency histogram (micros):",
    "": "Count: 9 Average: 0.8889 StdDev: 0.74",
    "": "Min: 0 Median: 0.6429 Max: 2",
    "": "Percentiles: P50: 0.64 P75: 0.96 P99: 1.96 P99.9: 2.00 P99.99: 2.00",
    "": "------------------------------------------------------",
    "": "[ 0, 1 ] 7 77.778% 77.778% ################",
    "": "( 1, 2 ] 2 22.222% 100.000% ####",
    "": "",
    "": "** Level 2 read latency histogram (micros):",
    "": "Count: 87 Average: 0.6667 StdDev: 0.83",
    "": "Min: 0 Median: 0.5724 Max: 4",
    "": "Percentiles: P50: 0.57 P75: 0.86 P99: 3.13 P99.9: 3.91 P99.99: 3.99",
    "": "------------------------------------------------------",
    "": "[ 0, 1 ] 76 87.356% 87.356% #################",
    "": "( 1, 2 ] 8 9.195% 96.552% ##",
    "": "( 2, 3 ] 2 2.299% 98.851% ",
    "": "( 3, 4 ] 1 1.149% 100.000% ",
    "": "",
    "": "** Level 3 read latency histogram (micros):",
    "": "Count: 411 Average: 0.5547 StdDev: 0.95",
    "": "Min: 0 Median: 0.5394 Max: 15",
    "": "Percentiles: P50: 0.54 P75: 0.81 P99: 1.89 P99.9: 12.95 P99.99: 14.79",
    "": "------------------------------------------------------",
    "": "[ 0, 1 ] 381 92.701% 92.701% ###################",
    "": "( 1, 2 ] 29 7.056% 99.757% #",
    "": "( 10, 15 ] 1 0.243% 100.000% ",
    "": "",
    "": "",
    "": "** DB Stats **",
    "": "Uptime(secs): 1.6 total, 0.0 interval",
    "": "Cumulative writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest: 0.00 GB, 0.00 MB/s",
    "": "Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s",
    "": "Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent",
    "": "Interval writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest: 0.00 MB, 0.00 MB/s",
    "": "Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 MB, 0.00 MB/s",
    "": "Interval stall: 00:00:0.000 H:M:S, 0.0 percent"
    }

2020-04-21 19:51:21.599 7f439d4b7e00 1 bluestore(/var/lib/ceph/osd/ceph-5) umount
2020-04-21 19:51:21.603 7f439d4b7e00 1 bluefs umount
2020-04-21 19:51:21.603 7f439d4b7e00 1 fbmap_alloc 0x55930598ee00 shutdown
2020-04-21 19:51:21.603 7f439d4b7e00 1 fbmap_alloc 0x55930598ea00 shutdown
2020-04-21 19:51:21.603 7f439d4b7e00 1 bdev(0x55930660f180 /var/lib/ceph/osd/ceph-5/block.db) close
2020-04-21 19:51:21.931 7f439d4b7e00 1 bdev(0x55930660ee00 /var/lib/ceph/osd/ceph-5/block) close
2020-04-21 19:51:21.995 7f439d4b7e00 1 bdev(0x55930660e700 /var/lib/ceph/osd/ceph-5/block) close
```

#3 Updated by Stefan Priebe over 1 year ago

Igor did you had a look at those values? Any chance to fix this manually?

#4 Updated by Igor Fedotov over 1 year ago

Stefan,
I presume you did invoke bluefs-bdev-new-db command only, didn't you?
In fact this command allocates new DB volume but generally performs no data migration. RocksDB might do that eventually for you but likely it doesn't move some minor auxiliary file(s).
So you can try bluefs-bdev-migrate command to move every BlueFS stuff from main device to new DB one. Here is an example of the command for vstart cluster:
ceph-bluestore-tool --path dev/osd1/ --devs-source dev/osd1/block --dev-target dev/osd1/block.db bluefs-bdev-migrate

Please note source/target device accessed via corresponding symlinks from OSD folder.

#5 Updated by Stefan Priebe over 1 year ago

Thanks a lot. That solves the problem. Would be good to have a note about it in docs.

#6 Updated by Neha Ojha over 1 year ago

  • Status changed from New to Won't Fix

Also available in: Atom PDF