Backport #22264: luminous: bluestore: db.slow used when db is not full - bluestore - Ceph

Actions

Copy link

Backport #22264

closed

luminous: bluestore: db.slow used when db is not full

Added by Sage Weil over 6 years ago. Updated almost 6 years ago.

Status:

Resolved

Priority:

High

Assignee:

Igor Fedotov

Target version:

Ceph - v12.2.3

Release:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

From: shasha lu <lushasha08@gmail.com>
To: ceph-devel@vger.kernel.org
Cc: Mark Nelson <mark.a.nelson@gmail.com>
Subject: metadata spill back onto block.slow before block.db filled up

Hi, Mark
We test bluestore with 12.2.1.
There are two host in our rgw cluster, each host contain 2 osds. The
rgw pool size is 2.  Using a 5GB partition for db.wal, a 50GB SSD
partition for block.db.

# ceph --admin-daemon ceph-osd.1.asok config get rocksdb_db_paths
{
    "rocksdb_db_paths": "db,51002736640 db.slow,284999998054" 
}

After writing about 400W 4k rgw objects, using ceph-bluestore-tool to
export rocksdb file.

# ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/osd1
--out-dir /tmp/osd1
# cd /tmp/osd1
# ls
db  db.slow  db.wal
# du -sh *
2.8G    db
809M    db.slow
439M    db.wal

block.db partition have 50GB space, but it only contains ~3GB files.
Then the metadata rolling over onto the db.slow.
It seems that only L0-L2 files located in block.db. (L0 256M; L1 256M;
L2 2.5GB), L3 and higher level file located in db.slow.

According to ceph docs, the metadata rolling over onto the db.slow
only when block.db filled up. But in our env the block.db partition is
far from filled up.
Did I make any mistakes?  Is there any additional options should be
set to rocksdb?

Files

compaction_picker_test.diff (3.81 KB) compaction_picker_test.diff

Igor Fedotov, 11/29/2017 10:06 AM

Actions

Copy link

Updated by Igor Fedotov over 6 years ago

Looks like a RocksDB bug fixed by
https://github.com/facebook/rocksdb/commit/65a9cd616876c7a1204e1a50990400e4e1f61d7e

Will verify this hypothesis tomorrow.

Actions

Copy link

Updated by tangwenjun tang over 6 years ago

max_bytes_for_level_base and max_bytes_for_level_multiplier
can control the level file located

Actions

Copy link

Updated by Igor Fedotov over 6 years ago

tangwenjun tang wrote:

max_bytes_for_level_base and max_bytes_for_level_multiplier
can control the level file located

yeah, but it look like there was a bug in RocksDB in their usage.

Actions

Copy link

Updated by Igor Fedotov over 6 years ago

File compaction_picker_test.diff compaction_picker_test.diff added

Here is the UT that simulates reported issue and verifies rocksdb::GetPathId implementation from both Ceph's master and v12.2.1. It indeed had a bug when selecting the path in v12.2.1. It's been already fixed in rocksdb branch we have in Ceph's master.

Hence the question is how should we apply this rocksdb fix to v12.2.1 if any?

Actions

Copy link

Updated by Sage Weil over 6 years ago

Project changed from RADOS to bluestore
Category deleted (~~Performance/Resource Usage~~)

Actions

Copy link

Updated by Igor Fedotov over 6 years ago

https://github.com/ceph/ceph/pull/19257

Actions

Copy link

Updated by Sage Weil over 6 years ago

Assignee set to Igor Fedotov

Actions

Copy link

Updated by Sage Weil over 6 years ago

Status changed from 12 to In Progress

Actions

Copy link

Updated by Igor Fedotov over 6 years ago

Status changed from In Progress to Fix Under Review

Actions

Copy link

#10

Updated by Sage Weil about 6 years ago

luminous cherry-pick is merged.

Actions

Copy link

#11

Updated by Igor Fedotov about 6 years ago

Status changed from Fix Under Review to Resolved

Actions

Copy link

#12

Updated by Nathan Cutler almost 6 years ago

Tracker changed from Bug to Backport
Target version set to v12.2.3

Actions

Copy link

#13

Updated by Nathan Cutler almost 6 years ago

Sage Weil wrote:

luminous cherry-pick is merged.

Just to clarify, the luminous commit is not a cherry-pick. The fix in master was to advance the rocksdb submodule to a more recent upstream version including the fix for this bug, whereas the luminous version appears to bring only this specific fix into the submodule.

Unfortunately, I don't know which master PR advanced the rocksdb submodule.

Actions

Copy link

#14

Updated by Nathan Cutler almost 6 years ago

Subject changed from bluestore: db.slow used when db is not full to luminous: bluestore: db.slow used when db is not full

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » bluestore

Custom queries

Backport #22264

luminous: bluestore: db.slow used when db is not full

Updated by Igor Fedotov over 6 years ago

Updated by tangwenjun tang over 6 years ago

Updated by Igor Fedotov over 6 years ago

Updated by Igor Fedotov over 6 years ago

Updated by Sage Weil over 6 years ago

Updated by Igor Fedotov over 6 years ago

Updated by Sage Weil over 6 years ago

Updated by Sage Weil over 6 years ago

Updated by Igor Fedotov over 6 years ago

Updated by Sage Weil about 6 years ago

Updated by Igor Fedotov about 6 years ago

Updated by Nathan Cutler almost 6 years ago

Updated by Nathan Cutler almost 6 years ago

Updated by Nathan Cutler almost 6 years ago