Project

General

Profile

Bug #46780

BlueFS Spillover without db being full

Added by Seena Fallah about 1 year ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi. I'm facing a issue that my OSDs are going to spillover but db used that I check from prometheus `ceph_bluefs_db_used_bytes` is just 33GB and my db size is 191GB. The spillover amount is just 65MB and when I do a manual compaction it will resolved. Sometimes it needs double manual compaction to be cleaned.
I see this occurred when flushing event is triggering in RocksDB logs
I have attached the osd log when spillover occurred.
After a time I do a manual compaction it will again have spillover with this 65MB value.
Why my db has free space but it can't use it?
I'm using nautilus 14.2.9 on ubuntu 18.04.4

ceph-osd.116.log View (46.7 KB) Seena Fallah, 07/30/2020 11:09 PM

Screenshot from 2020-07-31 03-48-42.png View (26.3 KB) Seena Fallah, 07/30/2020 11:19 PM

Screenshot from 2020-07-31 03-48-30.png View (26.4 KB) Seena Fallah, 07/30/2020 11:19 PM

History

#1 Updated by Seena Fallah about 1 year ago

Here is also my db usage for that OSD

#2 Updated by Seena Fallah about 1 year ago

I have made a mistack in description the ceph_bluefs_db_used_bytes is 27GB as you see in screen shots and also after while a compacted OSD can be use 40GB of db and doesn't have any spillover but this OSD with 27GB db used has spillover!

#3 Updated by Neha Ojha 12 months ago

  • Project changed from RADOS to bluestore

#4 Updated by Josh Durgin 12 months ago

Igor confirmed this was an issue fixed by https://github.com/ceph/ceph/pull/33889 - L4 will go to the main device without this, which is why the spillover occurred.

#5 Updated by Seena Fallah 12 months ago

As I see in PR if I want to use this feature I should change `bluestore_volume_selection_policy` to `use_some_extra`. This option is in `LEVEL_DEV` as seen https://docs.ceph.com/docs/master/dev/config/#levels it's not suggested to use in production environment.
How can I use this PR to solve my problem?

#6 Updated by Konstantin Shalygin 6 months ago

  • Status changed from New to Triaged
  • Backport deleted (nautilus, octopus)
  • Affected Versions v14.2.2, v14.2.3, v14.2.4, v14.2.5, v14.2.6, v14.2.7, v14.2.8 added

Seena, this fixed in 14.2.11, and default in 14.2.12

#7 Updated by Igor Fedotov about 2 months ago

  • Status changed from Triaged to Closed

Fixed with new bluefs space tracking framework (see #39185) starting v14.2.12

Also available in: Atom PDF