Feature #8580
closedDecrease disk thread's IO priority and/or make it configurable
0%
Description
PG scrubbing (and other "background" activities) should not consume IOPS if there are client IOs to be performed. The cfq elevator allows setting IO priorities via the ioprio_set syscall.
In order to make scrubbing more transparent, we should give the disk thread a lower priority, f.e. the best effort IO priority class with subclass 7. Or to make it fully transparent we could use the idle priority class. Ideally the class/subclass would be configurable.
For an example of what needs to be done, see this related patch for btrfs-progs: http://www.spinics.net/lists/linux-btrfs/msg14909.html
Updated by Dan van der Ster almost 10 years ago
I've done some fio testing to motivate this further. The fio config below should simulate the case where the journal is co-located on a spinning disk with the filestore.
[global]
ioengine=libaio
invalidate=1
runtime=20
#
[deepscrub]
direct=0
iodepth=8
bs=512k
rw=read
size=4g
prioclass=2 # to be varied
prio=4 # to be varied
#
[writeobj]
direct=1
sync=1 # not sure if an OSD journal is O_SYNC?
bs=4k
rw=write
rate_iops=10
size=1g
numjobs=2
prioclass=2 # to be varied
prio=4 # to be varied
This configures one scrubbing thread who reads in 512k chunks, and two direct IO writers who write at 10 IOPS each. The goal is then to optimise the latency of those 10 IOPS in the write threads.
Using cfq (from RHEL6.5) and default io priorities (be/4), I get:
lat (usec): min=205 , max=304311 , avg=51303.22, stdev=46714.35
Using cfq with scrub prio=be/7 and writer prio be/1, I get:
lat (usec): min=245 , max=287430 , avg=52490.84, stdev=47354.12
Using cfq with scrub prio=idle and writer prio=be/4, I get:
lat (usec): min=273 , max=225333 , avg=6272.16, stdev=29405.21
and finally with cfq and scrub prio=be/4 and writer prio=realtime, I get:
lat (usec): min=281 , max=215453 , avg=6802.68, stdev=29901.80
We see that when the scrubber and writers are in the same prioclass (best effort), no matter the priority, the write latency is ~50ms. By either putting the scrubber into the idle class, or by putting the writers into realtime, the write latency improves to 6-7ms.
I would therefore propose to make the prioclass and prio configurable for each of the op and disk thread pools.
Updated by Brian Andrus almost 10 years ago
- Source changed from other to Support
Updated by Sage Weil almost 10 years ago
- Status changed from New to In Progress
- Assignee set to Sage Weil
- Backport changed from dumpling to firefy,dumpling
Updated by Sage Weil almost 10 years ago
- Status changed from In Progress to Resolved
- Backport changed from firefy,dumpling to firefy
would rather not backport the ioprio stuff to dumpling. the sleep is there.
Updated by Sage Weil almost 10 years ago
- Backport changed from firefy to firefy,dumpling
oh, we did backport the io priority
Updated by Dan van der Ster over 9 years ago
Hi,
The backport to dumpling is missing the commit which provides the new configurable: https://github.com/ceph/ceph/commit/987ad133415aa988061c95259f9412b05ce8ac7e
And the backport to firefly is missing completely.
Cheers, Dan