Visibility for snap trim queue length
We observed unexplained, constant disk space usage increase on a few of our prod clusters. At first we thought that it's because of customers abusing them, but that wasn't it. Then we though that images are constantly filled with data, but space usage reported by Ceph wasn't consistent with filesystem. After further digging, we realized that snap trim queues for some of PGs are in 250k elements territory... We increased the snap trimmer frequency and number of parallel snap trim ops and disk space usage finally started to drop.
Ceph needs a features to efficiently and conveniently access snap trim queue lengths so it can be used with monitoring, and a features to warn Ceph cluster admins when snap trim queues are long enough to be requiring some attention.
#5 Updated by Nathan Cutler about 1 year ago
- Backport set to jewel, luminous
@Piotr: It's OK to add e.g. "jewel, luminous" to the "Backport" field right from the beginning, though.
When the master PR is merged, the status of the ticket is changed to "Pending Backport" and then an automated script automatically creates the backport tickets from the value of the "Backport" field.