Feature #62670
open[RFE] cephfs should track and expose subvolume usage and quota
0%
Description
Subvolumes may be queried independently, but at scale we need a way for subvolume usage and quota thresholds to drive alerts within the ceph as a healthcheck, and/or via prometheus as alerts and usage metrics.
Here's some ideas for the kinds of metrics that would be useful for the mgr/prometheus module to expose for alerting and usage tracking
ceph_fs_subvolume_count{fs_id="1", data_pool="pool_a"} <n> ceph_fs_subvolume_metadata{fs_id="1", data_pool="pool_a", name="subvol_1", } 1 ceph_fs_subvolume_usage_bytes_total{fs_id="1", name="subvol_1"} <n> ceph_fs_subvolume_quota_bytes_total{fs_id="1", name="subvol_1"} <n>With metrics like these, we could
- raise alerts on quota near full to avoid application outages
- use promql functions like predict_linear to forecast fill rates per subvolume
- understand overcommit of the filesystem (sum of quota > fs capacity)
- identify unused subvolumes so admins can follow up and delete
Having these metrics would be key to managing the capacity usage within native cephfs and ganesha/cephfs
Updated by Venky Shankar 8 months ago
- Assignee set to Kotresh Hiremath Ravishankar
Paul Cuzner wrote:
Subvolumes may be queried independently, but at scale we need a way for subvolume usage and quota thresholds to drive alerts within the ceph as a healthcheck, and/or via prometheus as alerts and usage metrics.
Some of these might be available as part of `subvolume info` result json. If not, I think its pretty straightforward to add those. Kotresh?
Updated by Kotresh Hiremath Ravishankar 6 months ago
Hi Paul/Venky,
The mgr/volumes exposes to apis to get the information asked here.
1. All subvolume related information including quota is provided by info command as below.
kotresh:build$ bin/ceph fs subvolume info a sub_0 { "atime": "2023-11-09 16:30:33", "bytes_pcent": "20.97", "bytes_quota": 5000000, "bytes_used": 1048576, "created_at": "2023-11-09 16:30:33", "ctime": "2023-11-09 16:36:58", "data_pool": "cephfs.a.data", "features": [ "snapshot-clone", "snapshot-autoprotect", "snapshot-retention" ], "gid": 0, "mode": 16877, "mon_addrs": [ "192.168.1.114:40799" ], "mtime": "2023-11-09 16:36:58", "path": "/volumes/_nogroup/sub_0/43912b23-e164-497e-bddf-264a58a5588f", "pool_namespace": "", "state": "complete", "type": "subvolume", "uid": 0 }
2. The subvolume count can be fetched by below api
kotresh:build$ bin/ceph fs subvolume ls a 2>/dev/null [ { "name": "sub_1" }, { "name": "sub_0" } ]
I didn't understand at what level the info is being expected here. Could you be more specific w.r.t the requirement from mgr/volumes ?
Thanks,
Kotresh H R
Updated by Kotresh Hiremath Ravishankar 6 months ago
- Status changed from New to Need More Info