Documentation #49406
closedExceeding osd nearfull ratio causes write throttle.
0%
Description
We noticed a 20x write performance reduction on our CEPHFS cluster shortly after one of our OSDs exceeded the near-full ratio.
Historically (pre nautilus), the "nearfull" warning never had any operational impact, so it took us a while to narrow down the actual cause.
Would be worth adding to user documentation: "If we are near [nearfull] ENOSPC, write synchronously." and the implications on client performance.
Appears to be related to the following commit:
https://github.com/ceph/ceph-client/commit/7614209736fbc4927584d4387faade4f31444fce
Kernel 4.19.154
Updated by Jeff Layton about 3 years ago
It's unfortunate that it caught you by surprise. Would you care to draft a patch to update the documentation? Where would it have been most helpful to read this?
Updated by Jan-Philipp Litza almost 3 years ago
I got caught by surprise, too. Maybe at least in Kernel Mount Debugging so that when it gets slow, one can find the answer. And/or in man mount.ceph and Mount CephFS using kernel driver
At least those are the pages I currently have open...
Updated by Jeff Layton over 2 years ago
- Tracker changed from Bug to Documentation
- Project changed from Linux kernel client to CephFS
- Category deleted (
fs/ceph) - Status changed from New to In Progress
- Pull request ID set to 42749
Ok, added a blurb to https://docs.ceph.com/en/latest/cephfs/troubleshooting/#kernel-mount-debugging
Updated by Patrick Donnelly over 2 years ago
- Status changed from In Progress to Resolved
Updated by Niklas Hambuechen almost 2 years ago
After wondering for a long time why my clusters get slow at some point, I finally found this as well.
It would be fantastic if `ceph status` could not only point out when a device gets NEARFULL, but also give a hint what massive impact that can have.