Bug #61849
openCeph balancer should probably run on unhealthy pool when it would make the pool healthy
0%
Description
The balancer does not run when a pool is unhealthy, see https://docs.ceph.com/en/quincy/rados/operations/balancer/#throttling
When the cluster is healthy, the balancer will [run]
This unfortunately means that a cluster can get stuck in an unhealthy state which the balancer would fix.
For example, I upgraded a cluster from a 2-machine, "osd"-failure-domain "size = 2" replication to a "host"-failure-domain "size = 3" replication.
Let's say the old machines are "node-1", "node-2", and "node-3" was added
The above change means it had to move a lot of PGs, including some from node-1 to node-2 (because before the "osd"-failure-domain allowed a PG is to be on 2 OSDs on the same host).
It got stuck in this state, with "Low space hindering backfill", on my Ceph 16.2.7:
cluster: id: 067f0bf9-893c-4b23-a27c-56e8685e9d5c health: HEALTH_WARN 3 nearfull osd(s) Low space hindering backfill (add storage if this doesn't resolve itself): 10 pgs backfill_toofull Degraded data redundancy: 1390436/239568771 objects degraded (0.580%), 10 pgs degraded, 10 pgs undersized 1 pool(s) nearfull services: mon: 3 daemons, quorum node-2,node-3,node-1 (age 5d) mgr: node-1(active, since 5d), standbys: node-2, node-3 mds: 1/1 daemons up, 2 standby osd: 36 osds: 36 up (since 5d), 36 in (since 10d); 10 remapped pgs data: volumes: 1/1 healthy pools: 4 pools, 641 pgs objects: 79.87M objects, 114 TiB usage: 343 TiB used, 96 TiB / 439 TiB avail pgs: 1390436/239568771 objects degraded (0.580%) 1265017/239568771 objects misplaced (0.528%) 631 active+clean 10 active+undersized+degraded+remapped+backfill_toofull
I believe the "10 pgs backfill_toofull" is because the due to the unbalancedness of the cluster, the PG moves from node-1 to node-2 were impossible.
That is even though with the newly added machine, the cluster has a lot of free space.
There's a vicious cycle:
backfill_toofull -> cluster is not healthy -> balancer does not run -> backfill_toofull
This has also been reported e.g. on: https://old.reddit.com/r/ceph/comments/ouk623/unbalanced_osds_backfilling_constantly_balancer/
Note I found a possibly related/confounding issue in which PGs are marked as backfill_toofull even though OSDs are apparently not counted as full: https://tracker.ceph.com/issues/61839
Overall I think it would make sense for balancing not prevented when the cluster is unhealthy, as balancing can help (and in this case, seems required) to make the cluster healthy.
Updated by Konstantin Shalygin 9 months ago
- Project changed from Ceph to mgr
- Category set to balancer module