Project

General

Profile

Bug #48070

Wrong bluefs db usage value (doubled) returned by `perf dump` when option `bluestore_rocksdb_cf` is turned on.

Added by Kinga Karczewska 3 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
bluestore perf dump bluestore_rocksdb_cf
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Pull request ID:
Crash signature:

Description

During some tests we discovered that OSD db usage returned by `ceph daemon osd.num perf dump` tool is twice the real value when option `bluestore_rocksdb_cf` is turned on.
Tests were various but here I will mention only one test case.
Test steps:
1. We created single OSD Ceph cluster with db (fast) disk of size 10GB and default RocksDB level sizes an multiplier (no compression).
2. We did 50 benchmarks (using `rados bench write --write-omap' command): max bench time 100s, writing 16 omaps of 5MB.
3. We invoked 'ceph daemon osd.0 perf.dump'command every 1 second to collect current Bluestore usage (bluefs:db_used_bytes', bluefs:slow_used_bytes).
4. We collected Compaction Stats from osd.0.log (RocksDB levels usages collected every 1 second).
5. We collected allocations in osd.0.log (grep with '_allocate' keyword).

Expected result: Fast disk has around 4GB used (50*16*5MB). Comaction Stats and db_used_bytes should show the same results.

Obtained results:

1. bluestore_volume_selection_policy=rocksdb_original
1a. 14.10.2020_16-32-35 - When option 'bluestore_rocksdb_cf' is turned off (there is only 'default' column family') everything works as expected. Results from compaction stats, allocated memory and perf dump are consistent.
1b. 15.10.2020_02-05-22 - When option 'bluestore_rocksdb_cf' is turned on (multiple column families) the usage is twice as expected. Results from perf dump show usage twice bigger than these showed by compaction stats. Allocated memory and our expectations let us suppose that RocksDB Compaction Stats have the true value.

2. bluestore_volume_selection_policy=use_some_extra
2a. 15.10.2020_03-55-19 - When option 'bluestore_rocksdb_cf' is turned off - everything works as expected. Results are consistent.
2b. 15.10.2020_01-23-07 - When option 'bluestore_rocksdb_cf' is turned on - the usage is twice as expected.

Please see attached graphical visualization of the results.

bluestore_rocksdb_cf comparison.pdf - Graphical test results. (703 KB) Kinga Karczewska, 11/02/2020 02:18 PM

History

#1 Updated by Kinga Karczewska 2 months ago

As it turned out, this was caused by small size of WAL (`bluestore block wal size`) and the fact I did not set `max_total_wal_size` properly. When I set both values to 1GB, the problem disappeared. Nevertheless I think that these values should not influence the performance of Ceph storage.

#2 Updated by Kinga Karczewska 2 months ago

``max_total_wal_size` is discussed here: https://github.com/ceph/ceph/pull/35277

Also available in: Atom PDF