Feature #8580: Decrease disk thread's IO priority and/or make it configurable - Ceph - Ceph

Actions

Copy link

Feature #8580

closed

Decrease disk thread's IO priority and/or make it configurable

Added by Dan van der Ster almost 10 years ago. Updated over 9 years ago.

Status:

Resolved

Priority:

High

Assignee:

Sage Weil

Category:

OSD

Target version:

% Done:

Source:

Support

Tags:

Backport:

firefy,dumpling

Reviewed:

Affected Versions:

Pull request ID:

Description

PG scrubbing (and other "background" activities) should not consume IOPS if there are client IOs to be performed. The cfq elevator allows setting IO priorities via the ioprio_set syscall.

In order to make scrubbing more transparent, we should give the disk thread a lower priority, f.e. the best effort IO priority class with subclass 7. Or to make it fully transparent we could use the idle priority class. Ideally the class/subclass would be configurable.

For an example of what needs to be done, see this related patch for btrfs-progs: http://www.spinics.net/lists/linux-btrfs/msg14909.html

Actions

Copy link

Updated by Dan van der Ster almost 10 years ago

I've done some fio testing to motivate this further. The fio config below should simulate the case where the journal is co-located on a spinning disk with the filestore.

[global]
  ioengine=libaio
  invalidate=1
  runtime=20
#
  [deepscrub]
  direct=0
  iodepth=8
  bs=512k
  rw=read
  size=4g
  prioclass=2 # to be varied
  prio=4 # to be varied
#
  [writeobj]
  direct=1
  sync=1 # not sure if an OSD journal is O_SYNC?
  bs=4k
  rw=write
  rate_iops=10
  size=1g
  numjobs=2
  prioclass=2 # to be varied
  prio=4 # to be varied

This configures one scrubbing thread who reads in 512k chunks, and two direct IO writers who write at 10 IOPS each. The goal is then to optimise the latency of those 10 IOPS in the write threads.

Using cfq (from RHEL6.5) and default io priorities (be/4), I get:

lat (usec): min=205 , max=304311 , avg=51303.22, stdev=46714.35

Using cfq with scrub prio=be/7 and writer prio be/1, I get:

lat (usec): min=245 , max=287430 , avg=52490.84, stdev=47354.12

Using cfq with scrub prio=idle and writer prio=be/4, I get:

lat (usec): min=273 , max=225333 , avg=6272.16, stdev=29405.21

and finally with cfq and scrub prio=be/4 and writer prio=realtime, I get:

lat (usec): min=281 , max=215453 , avg=6802.68, stdev=29901.80

We see that when the scrubber and writers are in the same prioclass (best effort), no matter the priority, the write latency is ~50ms. By either putting the scrubber into the idle class, or by putting the writers into realtime, the write latency improves to 6-7ms.

I would therefore propose to make the prioclass and prio configurable for each of the op and disk thread pools.

Actions

Copy link