Project

General

Profile

Actions

Bug #61849

open

Ceph balancer should probably run on unhealthy pool when it would make the pool healthy

Added by Niklas Hambuechen 11 months ago. Updated 9 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
balancer module
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The balancer does not run when a pool is unhealthy, see https://docs.ceph.com/en/quincy/rados/operations/balancer/#throttling

When the cluster is healthy, the balancer will [run]

This unfortunately means that a cluster can get stuck in an unhealthy state which the balancer would fix.

For example, I upgraded a cluster from a 2-machine, "osd"-failure-domain "size = 2" replication to a "host"-failure-domain "size = 3" replication.

Let's say the old machines are "node-1", "node-2", and "node-3" was added
The above change means it had to move a lot of PGs, including some from node-1 to node-2 (because before the "osd"-failure-domain allowed a PG is to be on 2 OSDs on the same host).

It got stuck in this state, with "Low space hindering backfill", on my Ceph 16.2.7:

  cluster:
    id:     067f0bf9-893c-4b23-a27c-56e8685e9d5c
    health: HEALTH_WARN
            3 nearfull osd(s)
            Low space hindering backfill (add storage if this doesn't resolve itself): 10 pgs backfill_toofull
            Degraded data redundancy: 1390436/239568771 objects degraded (0.580%), 10 pgs degraded, 10 pgs undersized
            1 pool(s) nearfull

  services:
    mon: 3 daemons, quorum node-2,node-3,node-1 (age 5d)
    mgr: node-1(active, since 5d), standbys: node-2, node-3
    mds: 1/1 daemons up, 2 standby
    osd: 36 osds: 36 up (since 5d), 36 in (since 10d); 10 remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 641 pgs
    objects: 79.87M objects, 114 TiB
    usage:   343 TiB used, 96 TiB / 439 TiB avail
    pgs:     1390436/239568771 objects degraded (0.580%)
             1265017/239568771 objects misplaced (0.528%)
             631 active+clean
             10  active+undersized+degraded+remapped+backfill_toofull

I believe the "10 pgs backfill_toofull" is because the due to the unbalancedness of the cluster, the PG moves from node-1 to node-2 were impossible.
That is even though with the newly added machine, the cluster has a lot of free space.

There's a vicious cycle:

backfill_toofull  ->  cluster is not healthy  ->  balancer does not run  ->  backfill_toofull

This has also been reported e.g. on: https://old.reddit.com/r/ceph/comments/ouk623/unbalanced_osds_backfilling_constantly_balancer/

Note I found a possibly related/confounding issue in which PGs are marked as backfill_toofull even though OSDs are apparently not counted as full: https://tracker.ceph.com/issues/61839

Overall I think it would make sense for balancing not prevented when the cluster is unhealthy, as balancing can help (and in this case, seems required) to make the cluster healthy.

Actions

Also available in: Atom PDF