Project

General

Profile

Actions

Feature #38816

open

Deferred writes do not work for random writes

Added by Марк Коренберг about 5 years ago. Updated almost 5 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Well, how to reproduce:

osd.11 is a bluestore OSD with RocksDB on SSD, and main data on HDD.

ceph osd pool create qwe 8 8
ceph osd pool set qwe size 1
for i in `seq 0 7`; do ceph osd pg-upmap 110.$i osd.11; done
rbd create -p qwe -s test 10G

fio -ioengine=rbd -name=test -bs=4k --sync=1 -iodepth=1 -pool=qwe -rbdname=test --rw=randwrite
gives about 170-200 IOPS

<same fio> --rw=write
gives about 1300 IOPS.

I suppose that it works as follows:
1. Small writes are deferred
2. After some criteria, bluestore starts flushing deferred writes to HDD
3. Since random IO on HDD is really SLOW, some small buffer connected to deferred writes gets filled, and write speed sticks to HDD speed.
4. Linear writes are much faster on HDD, so small deferred writes in linear mode are coalesced to big ones and HDD manage to write them until buffer gets filled.

So. I think the buffer is very small. And in random write scenario, (Bluestore + RocksDB on SSD) is MUCH SLOWER comparing to (Filestore + WAL on SSD).

I tried increasing bluestore_max_deferred_txc and bluestore_throttle_deferred_bytes -- this does not help.


Files

results.tar.xz (807 KB) results.tar.xz Марк Коренберг, 04/07/2019 07:32 PM
Actions

Also available in: Atom PDF