Bug #61849
openCeph balancer should probably run on unhealthy pool when it would make the pool healthy
0%
Description
The balancer does not run when a pool is unhealthy, see https://docs.ceph.com/en/quincy/rados/operations/balancer/#throttling
When the cluster is healthy, the balancer will [run]
This unfortunately means that a cluster can get stuck in an unhealthy state which the balancer would fix.
For example, I upgraded a cluster from a 2-machine, "osd"-failure-domain "size = 2" replication to a "host"-failure-domain "size = 3" replication.
Let's say the old machines are "node-1", "node-2", and "node-3" was added
The above change means it had to move a lot of PGs, including some from node-1 to node-2 (because before the "osd"-failure-domain allowed a PG is to be on 2 OSDs on the same host).
It got stuck in this state, with "Low space hindering backfill", on my Ceph 16.2.7:
cluster: id: 067f0bf9-893c-4b23-a27c-56e8685e9d5c health: HEALTH_WARN 3 nearfull osd(s) Low space hindering backfill (add storage if this doesn't resolve itself): 10 pgs backfill_toofull Degraded data redundancy: 1390436/239568771 objects degraded (0.580%), 10 pgs degraded, 10 pgs undersized 1 pool(s) nearfull services: mon: 3 daemons, quorum node-2,node-3,node-1 (age 5d) mgr: node-1(active, since 5d), standbys: node-2, node-3 mds: 1/1 daemons up, 2 standby osd: 36 osds: 36 up (since 5d), 36 in (since 10d); 10 remapped pgs data: volumes: 1/1 healthy pools: 4 pools, 641 pgs objects: 79.87M objects, 114 TiB usage: 343 TiB used, 96 TiB / 439 TiB avail pgs: 1390436/239568771 objects degraded (0.580%) 1265017/239568771 objects misplaced (0.528%) 631 active+clean 10 active+undersized+degraded+remapped+backfill_toofull
I believe the "10 pgs backfill_toofull" is because the due to the unbalancedness of the cluster, the PG moves from node-1 to node-2 were impossible.
That is even though with the newly added machine, the cluster has a lot of free space.
There's a vicious cycle:
backfill_toofull -> cluster is not healthy -> balancer does not run -> backfill_toofull
This has also been reported e.g. on: https://old.reddit.com/r/ceph/comments/ouk623/unbalanced_osds_backfilling_constantly_balancer/
Note I found a possibly related/confounding issue in which PGs are marked as backfill_toofull even though OSDs are apparently not counted as full: https://tracker.ceph.com/issues/61839
Overall I think it would make sense for balancing not prevented when the cluster is unhealthy, as balancing can help (and in this case, seems required) to make the cluster healthy.