Project

General

Profile

Actions

Bug #61839

open

PG_BACKFILL_FULL despite all OSDs being way below their ratios

Added by Niklas Hambuechen 10 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have a PG that has PG_BACKFILL_FULL despite all OSDs being below their "backfillfull_ratio":

# ceph health detail
HEALTH_WARN Low space hindering backfill (add storage if this doesn't resolve itself): 1 pg backfill_toofull; Degraded data redundancy: 1069078/229532112 objects degraded (0.466%), 9 pgs degraded, 9 pgs undersized
[WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this doesn't resolve itself): 1 pg backfill_toofull
    pg 2.96 is active+undersized+degraded+remapped+backfill_wait+backfill_toofull, acting [18,29]
[WRN] PG_DEGRADED: Degraded data redundancy: 1069078/229532112 objects degraded (0.466%), 9 pgs degraded, 9 pgs undersized
    pg 2.f is stuck undersized for 5d, current state active+undersized+degraded+remapped+backfilling, last acting [30,22]
    pg 2.2a is stuck undersized for 5d, current state active+undersized+degraded+remapped+backfilling, last acting [16,28]
    pg 2.77 is stuck undersized for 5d, current state active+undersized+degraded+remapped+backfilling, last acting [14,35]
    pg 2.96 is stuck undersized for 5d, current state active+undersized+degraded+remapped+backfill_wait+backfill_toofull, last acting [18,29]
    pg 2.c4 is stuck undersized for 5d, current state active+undersized+degraded+remapped+backfilling, last acting [17,26]
    pg 2.e9 is stuck undersized for 5d, current state active+undersized+degraded+remapped+backfilling, last acting [20,26]
    pg 2.f0 is stuck undersized for 5d, current state active+undersized+degraded+remapped+backfilling, last acting [14,30]
    pg 2.137 is stuck undersized for 5d, current state active+undersized+degraded+remapped+backfilling, last acting [17,28]
    pg 2.1a6 is stuck undersized for 5d, current state active+undersized+degraded+remapped+backfilling, last acting [16,27]
# ceph osd df
ID  CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS
 2    hdd  14.61099   1.00000   15 TiB   11 TiB   11 TiB      0 B   22 GiB  3.6 TiB  75.50  1.01   51      up
 3    hdd  14.61099   1.00000   15 TiB   12 TiB   12 TiB      0 B   25 GiB  2.6 TiB  82.08  1.10   56      up
 4    hdd  14.61099   1.00000   15 TiB   12 TiB   12 TiB      0 B   23 GiB  3.0 TiB  79.21  1.06   54      up
 5    hdd  14.61099   1.00000   15 TiB   13 TiB   13 TiB      0 B   26 GiB  2.0 TiB  86.65  1.16   58      up
 6    hdd  14.61099   1.00000   15 TiB  9.9 TiB  9.8 TiB      0 B   20 GiB  4.8 TiB  67.47  0.90   46      up
 7    hdd  14.61099   1.00000   15 TiB  9.5 TiB  9.4 TiB      0 B   19 GiB  5.1 TiB  65.04  0.87   44      up
 8    hdd  14.61099   1.00000   15 TiB   12 TiB   12 TiB      0 B   24 GiB  2.6 TiB  82.13  1.10   56      up
 9    hdd  14.61099   1.00000   15 TiB   11 TiB   11 TiB      0 B   22 GiB  3.9 TiB  73.41  0.98   50      up
10    hdd  14.61099   1.00000   15 TiB   10 TiB   10 TiB      0 B   21 GiB  4.5 TiB  69.03  0.92   47      up
11    hdd  14.61099   1.00000   15 TiB  8.8 TiB  8.7 TiB      0 B   18 GiB  5.8 TiB  60.21  0.81   41      up
 0    ssd   0.16399   1.00000  168 GiB   19 GiB  1.1 GiB   17 GiB  728 MiB  148 GiB  11.45  0.15   57      up
 1    ssd   0.16399   1.00000  168 GiB   14 GiB  1.0 GiB   13 GiB  546 MiB  154 GiB   8.40  0.11   47      up
14    hdd  14.61099   1.00000   15 TiB   11 TiB   11 TiB      0 B   27 GiB  3.4 TiB  76.39  1.02   52      up
15    hdd  14.61099   1.00000   15 TiB   12 TiB   12 TiB      0 B   28 GiB  2.6 TiB  82.11  1.10   56      up
16    hdd  14.61099   1.00000   15 TiB  9.4 TiB  9.4 TiB      0 B   24 GiB  5.2 TiB  64.64  0.86   44      up
17    hdd  14.61099   1.00000   15 TiB   12 TiB   12 TiB      0 B   28 GiB  2.5 TiB  83.05  1.11   56      up
18    hdd  14.61099   1.00000   15 TiB   11 TiB   11 TiB      0 B   27 GiB  3.5 TiB  76.31  1.02   52      up
19    hdd  14.61099   1.00000   15 TiB   12 TiB   12 TiB      0 B   29 GiB  2.4 TiB  83.55  1.12   57      up
20    hdd  14.61099   1.00000   15 TiB   11 TiB   11 TiB  453 KiB   28 GiB  3.5 TiB  76.32  1.02   52      up
21    hdd  14.61099   1.00000   15 TiB  7.9 TiB  7.9 TiB      0 B   21 GiB  6.7 TiB  54.39  0.73   37      up
22    hdd  14.61099   1.00000   15 TiB   11 TiB   10 TiB      0 B   25 GiB  4.1 TiB  71.88  0.96   49      up
23    hdd  14.61099   1.00000   15 TiB   12 TiB   12 TiB      0 B   29 GiB  2.4 TiB  83.54  1.12   57      up
12    ssd   0.16399   1.00000  168 GiB   21 GiB  1.1 GiB   19 GiB  783 MiB  147 GiB  12.49  0.17   57      up
13    ssd   0.16399   1.00000  168 GiB   17 GiB  1.3 GiB   16 GiB  540 MiB  150 GiB  10.41  0.14   56      up
26    hdd  14.61099   1.00000   15 TiB   11 TiB   11 TiB      0 B   28 GiB  3.3 TiB  77.68  1.04   53      up
27    hdd  14.61099   1.00000   15 TiB   11 TiB   10 TiB      0 B   26 GiB  4.1 TiB  71.92  0.96   49      up
28    hdd  14.61099   1.00000   15 TiB   13 TiB   13 TiB      0 B   29 GiB  2.0 TiB  86.05  1.15   58      up
29    hdd  14.61099   1.00000   15 TiB   10 TiB   10 TiB      0 B   25 GiB  4.3 TiB  70.62  0.94   48      up
30    hdd  14.61099   1.00000   15 TiB  9.9 TiB  9.8 TiB      0 B   24 GiB  4.7 TiB  67.52  0.90   46      up
31    hdd  14.61099   1.00000   15 TiB   12 TiB   12 TiB      0 B   29 GiB  2.2 TiB  85.08  1.14   58      up
32    hdd  14.61099   1.00000   15 TiB   12 TiB   12 TiB      0 B   28 GiB  3.0 TiB  79.31  1.06   53      up
33    hdd  14.61099   1.00000   15 TiB   12 TiB   12 TiB      0 B   28 GiB  2.2 TiB  85.22  1.14   58      up
34    hdd  14.61099   1.00000   15 TiB  9.7 TiB  9.6 TiB      0 B   25 GiB  5.0 TiB  66.05  0.88   45      up
35    hdd  14.61099   1.00000   15 TiB  9.4 TiB  9.4 TiB      0 B   23 GiB  5.2 TiB  64.61  0.86   44      up
24    ssd   0.16399   1.00000  168 GiB   18 GiB  1.2 GiB   16 GiB  556 MiB  150 GiB  10.78  0.14   50      up
25    ssd   0.16399   1.00000  168 GiB   23 GiB  1.1 GiB   21 GiB  768 MiB  145 GiB  13.62  0.18   55      up
                        TOTAL  439 TiB  328 TiB  327 TiB  102 GiB  754 GiB  111 TiB  74.76                   

I can apparently set the ratios arbitrarily high, and still get the "PG_BACKFILL_FULL":

# ceph osd dump | grep ratio
full_ratio 0.97
backfillfull_ratio 0.97
nearfull_ratio 0.97

This is on.

ceph version 16.2.7

The docs on https://docs.ceph.com/en/quincy/rados/operations/health-checks/#pg-backfill-full say this happens

because one or more OSDs are above the backfillfull threshold.

However, this condition is not met here.

So either the docs are wrong, or there's a bug.

No data to display

Actions

Also available in: Atom PDF