Bug #48070: Wrong bluefs db usage value (doubled) returned by `perf dump` when option `bluestore_rocksdb_cf` is turned on. - bluestore - Ceph

Actions

Copy link

Bug #48070

open

Wrong bluefs db usage value (doubled) returned by `perf dump` when option `bluestore_rocksdb_cf` is turned on.

Added by Kinga Karczewska over 3 years ago. Updated almost 3 years ago.

Status:

New

Priority:

Normal

Assignee:

Target version:

Ceph - v16.0.0

% Done:

Source:

Community (dev)

Tags:

bluestore perf dump bluestore_rocksdb_cf

Backport:

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

ceph-qa-suite:

rados

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

During some tests we discovered that OSD db usage returned by `ceph daemon osd.num perf dump` tool is twice the real value when option `bluestore_rocksdb_cf` is turned on.
Tests were various but here I will mention only one test case.
Test steps:
1. We created single OSD Ceph cluster with db (fast) disk of size 10GB and default RocksDB level sizes an multiplier (no compression).
2. We did 50 benchmarks (using `rados bench write --write-omap' command): max bench time 100s, writing 16 omaps of 5MB.
3. We invoked 'ceph daemon osd.0 perf.dump'command every 1 second to collect current Bluestore usage (bluefs:db_used_bytes', bluefs:slow_used_bytes).
4. We collected Compaction Stats from osd.0.log (RocksDB levels usages collected every 1 second).
5. We collected allocations in osd.0.log (grep with '_allocate' keyword).

Expected result: Fast disk has around 4GB used (50*16*5MB). Comaction Stats and db_used_bytes should show the same results.

Obtained results:

1. bluestore_volume_selection_policy=rocksdb_original
1a. 14.10.2020_16-32-35 - When option 'bluestore_rocksdb_cf' is turned off (there is only 'default' column family') everything works as expected. Results from compaction stats, allocated memory and perf dump are consistent.
1b. 15.10.2020_02-05-22 - When option 'bluestore_rocksdb_cf' is turned on (multiple column families) the usage is twice as expected. Results from perf dump show usage twice bigger than these showed by compaction stats. Allocated memory and our expectations let us suppose that RocksDB Compaction Stats have the true value.

2. bluestore_volume_selection_policy=use_some_extra
2a. 15.10.2020_03-55-19 - When option 'bluestore_rocksdb_cf' is turned off - everything works as expected. Results are consistent.
2b. 15.10.2020_01-23-07 - When option 'bluestore_rocksdb_cf' is turned on - the usage is twice as expected.

Please see attached graphical visualization of the results.

Files

bluestore_rocksdb_cf comparison.pdf (703 KB) bluestore_rocksdb_cf comparison.pdf

Graphical test results.

Kinga Karczewska, 11/02/2020 02:18 PM

Actions

Copy link

Updated by Kinga Karczewska over 3 years ago

As it turned out, this was caused by small size of WAL (`bluestore block wal size`) and the fact I did not set `max_total_wal_size` properly. When I set both values to 1GB, the problem disappeared. Nevertheless I think that these values should not influence the performance of Ceph storage.

Actions

Copy link