Feature #59584
openUse sequential reads to speed up scrubs on HDD spinning disks
0%
Description
With many small objects (or files, for CephFS), scrubbing on spinning HDDs becomes incredibly slow because it is seek-bound.
`iostat` reports numbers as low as 8 MB/s, when current HDDs can do >= 100 MB/s sequential reads.
- BlueStore
- Small objects ("402.32M objects, 38 TiB" from "ceph status") due to small files (4 - 32 KiB) on CephFS.
See more details at:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ZV6RXR5RR7O7WBAGTWNPHLWWXZUHFB3W/
Traditional RAID scrubbing does not have this problem. It reads entire block devices sequentially, and thus gets full disk speed.
According to the mailing list thread, Ceph scrubs objects, and thus gets a lot of random IO (disk seeks) when there are many small objects.
This bottlenecks scrubbing 10x in my case.
ZFS had the same problem, see:
https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSNonlinearScrubs?showcomments#comments
OpenZFS solved it in 2017 (OpenZFS 0.8), using two-phase scrubs:
- https://github.com/openzfs/zfs/issues/3625
- https://github.com/openzfs/zfs/commit/d4a72f23863382bdf6d0ae33196f5b5decbc48fd
It would bge great if Ceph implemented the same approach.
No data to display