Project

General

Profile

Documentation #53161

document device write cache performance implications

Added by Dan van der Ster over 1 year ago. Updated about 1 year ago.

Status:
In Progress
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Tags:
performance
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Some HDDs and SSDs can suffer severe performance loss when their write cache is enabled. There's handwaving involved in this, but we should try to document how operators might evaluate and configure their devices to achieve significant performance speedups.

E.g. here at CERN a new delivery (14TB hdds) had significantly higher commit_latency than the old batch (6TB hdds). (~120ms on new, ~30ms on old)
This caused thousands of slow reqs when we added the first rack of these machines to the cluster, even after backfilling completed.

The device iops was also visibly poor when using

fio --name=/dev/sdy --ioengine=libaio --readwrite=randwrite --fsync=1 --direct=1 --blocksize=4K --runtime=10

Community members suggested to disable the devices' write cache:

hdparm -W 0 /dev/sdy

to workaround known firmware quirks (possibly that the drive is flushing its entire cache each fsync, being disastrous for performance)

hdparm -W 0 did not have any effect (on CentOS 8.4), but this did:

for x in /sys/class/scsi_disk/*/cache_type; do echo 'write through' > $x; done

Setting 'write through' via /sys/class/scsi_disk persists the cache setting on the disk (i.e. it disables the write cache and linux now knows about this).
Following that change, our new servers' commit_latency dropped to ~10ms (under normal load), and iowait dropped from 20% to basically zero.

SSDs can often benefit from similar tuning -- see https://yourcmc.ru/wiki/Ceph_performance#Drive_cache_is_slowing_you_down

For now, we should try to document the above (maybe after some advice from vendors?)

Eventually we may want to have the OSD do some auto-detection [1] and tuning, or just maintain a list of known quirky devices and set the cache mode at OSD startup (or in a udev rule).

[1] Spectrum Scale somehow detects devices with a volatile write cache. Any idea how that is done? https://www.ibm.com/docs/en/spectrum-scale-ece/5.0.4?topic=edition-volatile-write-cache-detection

History

#1 Updated by Dan van der Ster about 1 year ago

  • Status changed from New to In Progress
  • Pull request ID set to 43848

#2 Updated by Dan van der Ster about 1 year ago

  • Assignee set to Dan van der Ster

#3 Updated by Zac Dover about 1 year ago

See also

[ceph-users] High ceph_osd_commit_latency_ms on Toshiba MG07ACA14TE HDDs

Also available in: Atom PDF