Project

General

Profile

Actions

Feature #8580

closed

Decrease disk thread's IO priority and/or make it configurable

Added by Dan van der Ster almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
firefy,dumpling
Reviewed:
Affected Versions:
Pull request ID:

Description

PG scrubbing (and other "background" activities) should not consume IOPS if there are client IOs to be performed. The cfq elevator allows setting IO priorities via the ioprio_set syscall.

In order to make scrubbing more transparent, we should give the disk thread a lower priority, f.e. the best effort IO priority class with subclass 7. Or to make it fully transparent we could use the idle priority class. Ideally the class/subclass would be configurable.

For an example of what needs to be done, see this related patch for btrfs-progs: http://www.spinics.net/lists/linux-btrfs/msg14909.html

Actions #1

Updated by Dan van der Ster almost 10 years ago

I've done some fio testing to motivate this further. The fio config below should simulate the case where the journal is co-located on a spinning disk with the filestore.

[global]
ioengine=libaio
invalidate=1
runtime=20 #
[deepscrub]
direct=0
iodepth=8
bs=512k
rw=read
size=4g
prioclass=2 # to be varied
prio=4 # to be varied #
[writeobj]
direct=1
sync=1 # not sure if an OSD journal is O_SYNC?
bs=4k
rw=write
rate_iops=10
size=1g
numjobs=2
prioclass=2 # to be varied
prio=4 # to be varied

This configures one scrubbing thread who reads in 512k chunks, and two direct IO writers who write at 10 IOPS each. The goal is then to optimise the latency of those 10 IOPS in the write threads.

Using cfq (from RHEL6.5) and default io priorities (be/4), I get:

lat (usec): min=205 , max=304311 , avg=51303.22, stdev=46714.35

Using cfq with scrub prio=be/7 and writer prio be/1, I get:

lat (usec): min=245 , max=287430 , avg=52490.84, stdev=47354.12

Using cfq with scrub prio=idle and writer prio=be/4, I get:

lat (usec): min=273 , max=225333 , avg=6272.16, stdev=29405.21

and finally with cfq and scrub prio=be/4 and writer prio=realtime, I get:

lat (usec): min=281 , max=215453 , avg=6802.68, stdev=29901.80

We see that when the scrubber and writers are in the same prioclass (best effort), no matter the priority, the write latency is ~50ms. By either putting the scrubber into the idle class, or by putting the writers into realtime, the write latency improves to 6-7ms.

I would therefore propose to make the prioclass and prio configurable for each of the op and disk thread pools.

Actions #2

Updated by Brian Andrus almost 10 years ago

  • Source changed from other to Support
Actions #3

Updated by Sage Weil almost 10 years ago

  • Status changed from New to In Progress
  • Assignee set to Sage Weil
  • Backport changed from dumpling to firefy,dumpling
Actions #4

Updated by Sage Weil almost 10 years ago

  • Status changed from In Progress to Resolved
  • Backport changed from firefy,dumpling to firefy

would rather not backport the ioprio stuff to dumpling. the sleep is there.

Actions #5

Updated by Sage Weil almost 10 years ago

  • Backport changed from firefy to firefy,dumpling

oh, we did backport the io priority

Actions #6

Updated by Dan van der Ster over 9 years ago

Hi,

The backport to dumpling is missing the commit which provides the new configurable: https://github.com/ceph/ceph/commit/987ad133415aa988061c95259f9412b05ce8ac7e

And the backport to firefly is missing completely.

Cheers, Dan

Actions

Also available in: Atom PDF