Project

General

Profile

Actions

Bug #61849

open

Ceph balancer should probably run on unhealthy pool when it would make the pool healthy

Added by Niklas Hambuechen 10 months ago. Updated 9 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
balancer module
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The balancer does not run when a pool is unhealthy, see https://docs.ceph.com/en/quincy/rados/operations/balancer/#throttling

When the cluster is healthy, the balancer will [run]

This unfortunately means that a cluster can get stuck in an unhealthy state which the balancer would fix.

For example, I upgraded a cluster from a 2-machine, "osd"-failure-domain "size = 2" replication to a "host"-failure-domain "size = 3" replication.

Let's say the old machines are "node-1", "node-2", and "node-3" was added
The above change means it had to move a lot of PGs, including some from node-1 to node-2 (because before the "osd"-failure-domain allowed a PG is to be on 2 OSDs on the same host).

It got stuck in this state, with "Low space hindering backfill", on my Ceph 16.2.7:

  cluster:
    id:     067f0bf9-893c-4b23-a27c-56e8685e9d5c
    health: HEALTH_WARN
            3 nearfull osd(s)
            Low space hindering backfill (add storage if this doesn't resolve itself): 10 pgs backfill_toofull
            Degraded data redundancy: 1390436/239568771 objects degraded (0.580%), 10 pgs degraded, 10 pgs undersized
            1 pool(s) nearfull

  services:
    mon: 3 daemons, quorum node-2,node-3,node-1 (age 5d)
    mgr: node-1(active, since 5d), standbys: node-2, node-3
    mds: 1/1 daemons up, 2 standby
    osd: 36 osds: 36 up (since 5d), 36 in (since 10d); 10 remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 641 pgs
    objects: 79.87M objects, 114 TiB
    usage:   343 TiB used, 96 TiB / 439 TiB avail
    pgs:     1390436/239568771 objects degraded (0.580%)
             1265017/239568771 objects misplaced (0.528%)
             631 active+clean
             10  active+undersized+degraded+remapped+backfill_toofull

I believe the "10 pgs backfill_toofull" is because the due to the unbalancedness of the cluster, the PG moves from node-1 to node-2 were impossible.
That is even though with the newly added machine, the cluster has a lot of free space.

There's a vicious cycle:

backfill_toofull  ->  cluster is not healthy  ->  balancer does not run  ->  backfill_toofull

This has also been reported e.g. on: https://old.reddit.com/r/ceph/comments/ouk623/unbalanced_osds_backfilling_constantly_balancer/

Note I found a possibly related/confounding issue in which PGs are marked as backfill_toofull even though OSDs are apparently not counted as full: https://tracker.ceph.com/issues/61839

Overall I think it would make sense for balancing not prevented when the cluster is unhealthy, as balancing can help (and in this case, seems required) to make the cluster healthy.

Actions #1

Updated by Konstantin Shalygin 9 months ago

  • Project changed from Ceph to mgr
  • Category set to balancer module
Actions

Also available in: Atom PDF