Wrong bluefs db usage value (doubled) returned by `perf dump` when option `bluestore_rocksdb_cf` is turned on.
During some tests we discovered that OSD db usage returned by `ceph daemon osd.num perf dump` tool is twice the real value when option `bluestore_rocksdb_cf` is turned on.
Tests were various but here I will mention only one test case.
1. We created single OSD Ceph cluster with db (fast) disk of size 10GB and default RocksDB level sizes an multiplier (no compression).
2. We did 50 benchmarks (using `rados bench write --write-omap' command): max bench time 100s, writing 16 omaps of 5MB.
3. We invoked 'ceph daemon osd.0 perf.dump'command every 1 second to collect current Bluestore usage (bluefs:db_used_bytes', bluefs:slow_used_bytes).
4. We collected Compaction Stats from osd.0.log (RocksDB levels usages collected every 1 second).
5. We collected allocations in osd.0.log (grep with '_allocate' keyword).
Expected result: Fast disk has around 4GB used (50*16*5MB). Comaction Stats and db_used_bytes should show the same results.
1a. 14.10.2020_16-32-35 - When option 'bluestore_rocksdb_cf' is turned off (there is only 'default' column family') everything works as expected. Results from compaction stats, allocated memory and perf dump are consistent.
1b. 15.10.2020_02-05-22 - When option 'bluestore_rocksdb_cf' is turned on (multiple column families) the usage is twice as expected. Results from perf dump show usage twice bigger than these showed by compaction stats. Allocated memory and our expectations let us suppose that RocksDB Compaction Stats have the true value.
2a. 15.10.2020_03-55-19 - When option 'bluestore_rocksdb_cf' is turned off - everything works as expected. Results are consistent.
2b. 15.10.2020_01-23-07 - When option 'bluestore_rocksdb_cf' is turned on - the usage is twice as expected.
Please see attached graphical visualization of the results.
#1 Updated by Kinga Karczewska 2 months ago
As it turned out, this was caused by small size of WAL (`bluestore block wal size`) and the fact I did not set `max_total_wal_size` properly. When I set both values to 1GB, the problem disappeared. Nevertheless I think that these values should not influence the performance of Ceph storage.