Project

General

Profile

Actions

Bug #56488

closed

BlueStore doesn't defer small writes for pre-pacific hdd osds

Added by Dan van der Ster almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
% Done:

100%

Source:
Community (dev)
Tags:
Backport:
quincy, pacific
Regression:
Yes
Severity:
2 - major
Reviewed:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We're upgrading clusters to v16.2.9 from v15.2.16, and our simple "rados bench -p test 10 write -b 4096 -t 1" latency probe showed something is very wrong with deferred writes in pacific.
I attached a plot from an example cluster, upgraded today.

The OSDs are 12TB HDDs, formatted in nautilus with the default bluestore_min_alloc_size_hdd = 64kB, and each have a large flash block.db.

I found that the performance issue is because 4kB writes are no longer deferred from those pre-pacific hdds to flash in pacific with the default config.
Here are example bench writes from both releases: https://pastebin.com/raw/m0yL1H9Z

I worked out that the issue is fixed if I set bluestore_prefer_deferred_size_hdd = 128k (up from the 64k pacific default. Note the default was 32k in octopus).

I think this is related to the fixes in #52089 which landed in 16.2.6 -- _do_alloc_write is now comparing the prealloc size 0x10000 with bluestore_prefer_deferred_size_hdd (0x10000) and the "strictly less than" condition prevents deferred writes from ever happening.

So I think this would impact anyone upgrading clusters with hdd/ssd mixed osds.

Should we increase the default bluestore_prefer_deferred_size_hdd up to 128kB or is there in fact a bug here?


Files

image (1).png (126 KB) image (1).png latency increase after upgrade to pacific Dan van der Ster, 07/07/2022 08:27 AM

Related issues 2 (0 open2 closed)

Copied to bluestore - Backport #58102: pacific: BlueStore doesn't defer small writes for pre-pacific hdd osdsResolvedAdam KupczykActions
Copied to bluestore - Backport #58103: quincy: BlueStore doesn't defer small writes for pre-pacific hdd osdsResolvedAdam KupczykActions
Actions

Also available in: Atom PDF