Project

General

Profile

Actions

Bug #13844

closed

ceph df MAX AVAIL is incorrect for simple replicated pool

Added by Dan van der Ster over 8 years ago. Updated over 7 years ago.

Status:
Rejected
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Since we upgraded from firefly to hammer the MAX AVAIL column is quite wrong for most of our pools.

GLOBAL:
    SIZE      AVAIL     RAW USED     %RAW USED
    3564T     2544T        1020T         28.61

Example pool:

POOLS:
    NAME                   ID     USED       %USED     MAX AVAIL     OBJECTS
    volumes                4        255T      7.17          272T     67556416

By my calculations MAX AVAIL should be 2184T/3 = 728T.

pool 4 'volumes' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 296326 min_read_recency_for_promote 1 stripe_width 0

crush_ruleset 0:

    {
        "rule_id": 0,
        "rule_name": "data",
        "ruleset": 0,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -2,
                "item_name": "0513-R-0050" 
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "rack" 
            },
            {
                "op": "emit" 
            }
        ]
    },

The osd tree:

ID   WEIGHT     TYPE NAME                            UP/DOWN REWEIGHT PRIMARY-AFFINITY
 -92          0 root drain
 -18  218.39999 root os
 -22  218.39999     room 0513-R-0050-os
 -30   54.59999         rack RJ55-os
 -34   54.59999             host p05151113748698-os
 -44   54.59999         rack RJ47-os
 -26   54.59999             host p05151113613837-os
 -62   54.59999         rack RJ45-os
 -50   54.59999             host p05151113587529-os
 -64   54.59999         rack RJ41-os
 -63   54.59999             host p05151113561997-os
  -1 3349.22803 root default
  -2 2184.42822     room 0513-R-0050
  -3  163.80000         rack RJ35
 -15   54.59999             host p05151113471870
 -16   54.59999             host p05151113489275
 -17   54.59999             host p05151113479552
  -4  163.80000         rack RJ37
 -23   54.59999             host p05151113507373
 -24   54.59999             host p05151113508409
 -25   54.59999             host p05151113521447
  -5  163.80000         rack RJ39
 -19   54.59999             host p05151113538756
 -20   54.59999             host p05151113535271
 -21   54.59999             host p05151113534235
  -6  163.80000         rack RJ41
 -27   54.59999             host p05151113558761
 -28   54.59999             host p05151113544113
 -29   54.59999             host p05151113551146
  -7  163.79990         rack RJ43
 -31   54.59999             host p05151113573587
 -32   54.59999             host p05151113578124
 -33   54.59991             host p05151113568206
  -8  163.79990         rack RJ45
 -35   54.59999             host p05151113578807
 -41   54.59999             host p05151113585107
 -47   54.59991             host p05151113590997
  -9  163.80000         rack RJ47
 -38   54.59999             host p05151113599377
 -39   54.59999             host p05151113598352
 -40   54.59999             host p05151113619324
 -10  219.30989         rack RJ49
 -36   55.50992             host p05151113636272
 -43   54.59999             host p05151113640230
 -48   54.59999             host p05151113642826
 -49   54.59999             host p05151113633314
 -11  218.39981         rack RJ51
 -37   54.59991             host p05151113676458
 -42   54.59999             host p05151113674062
 -45   54.59991             host p05151113669359
 -46   54.59999             host p05151113654107
 -12  219.30989         rack RJ53
 -53   54.59999             host p05151113723693
 -54   54.59999             host p05151113706163
 -56   54.59999             host p05151113719408
 -58   55.50992             host p05151113677609
 -13  163.20972         rack RJ55
 -59   54.39999             host p05151113760120
 -60   54.40973             host p05151113725483
 -61   54.39999             host p05151113751590
 -14  217.59900         rack RJ57
 -51   54.39999             host p05151113781242
 -52   54.39999             host p05151113782262
 -55   54.39999             host p05151113778539
 -57   54.39999             host p05151113777233
 -65 1164.79980     room 0513-R-0060
 -71  582.39990         ipservice S513-A-IP37
 -70  291.19995             rack BA09
 -69   72.79999                 host p05798818a82857
 -73   72.79999                 host p05798818b00047
 -83   72.79999                 host p05798818b00174
 -86   72.79999                 host p05798818b04322
 -80  291.19995             rack BA10
 -79   72.79999                 host p05798818v47100
 -88   72.79999                 host p05798818v64334
 -90   72.79999                 host p05798818w03166
 -91   72.79999                 host p05798818v51559
 -76  582.39990         ipservice S513-A-IP62
 -75  291.19995             rack BA11
 -74   72.79999                 host p05798818s98313
 -84   72.79999                 host p05798818s63747
 -85   72.79999                 host p05798818s49204
 -89   72.79999                 host p05798818s40185
 -78  291.19995             rack BA12
 -77   72.79999                 host p05798818b12431
 -81   72.79999                 host p05798818b37327
 -82   72.79999                 host p05798818b78429
 -87   72.79999                 host p05798818b40951
Actions #1

Updated by Sage Weil over 8 years ago

  • Priority changed from Normal to Urgent
Actions #2

Updated by Sage Weil over 8 years ago

  • Status changed from New to Need More Info
  • Source changed from other to Community (dev)

This is the responsible code:

https://github.com/ceph/ceph/blob/master/src/mon/PGMonitor.cc#L1357

My guess is that this isn't a bug, but a single OSD with skewed placement, and the mon is correctly predicting that after writing only ~300TB that one OSD will fill up. Is that possible? (ceph osd df might help identify any outliers.)

Actions #3

Updated by Nils Meyer about 8 years ago

I'm seeing a similar issue on my small cluster:

root@ceph-mon1:/# /usr/bin/ceph df
GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    33517G     16983G       16534G         49.33
POOLS:
    NAME     ID     USED      %USED     MAX AVAIL     OBJECTS
    rbd      0      4662G     13.91         4720G     1610970

root@hv-p-host1:/# ceph osd df
ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS
18 1.81999  1.00000  1862G   862G  1000G 46.29 0.94  96
 0 1.81999  1.00000  1862G   917G   944G 49.26 1.00  95
 2 1.81999  1.00000  1862G   898G   963G 48.24 0.98  99
 4 1.81999  1.00000  1862G  1008G   853G 54.16 1.10 111
 6 1.81999  1.00000  1862G   914G   947G 49.11 1.00 102
 7 1.81999  1.00000  1862G   910G   951G 48.90 0.99  97
10 1.81999  1.00000  1862G   822G  1039G 44.15 0.90  95
 1 1.81999  1.00000  1862G   901G   960G 48.40 0.98  95
 5 1.81999  1.00000  1862G  1045G   816G 56.17 1.14 111
 3 1.81999  1.00000  1862G   709G  1152G 38.10 0.77  78
 8 1.81999  1.00000  1862G   994G   867G 53.40 1.08 112
 9 1.81999  1.00000  1862G  1038G   823G 55.77 1.13 109
12 1.81999  1.00000  1862G   875G   986G 47.02 0.95  96
13 1.81999  1.00000  1862G   850G  1011G 45.66 0.93  92
14 1.81999  1.00000  1862G   832G  1030G 44.69 0.91  91
15 1.81999  1.00000  1862G   973G   888G 52.28 1.06 108
16 1.81999  1.00000  1862G   904G   957G 48.58 0.98  97
17 1.81999  1.00000  1862G  1075G   786G 57.74 1.17 116
              TOTAL 33517G 16534G 16983G 49.33
MIN/MAX VAR: 0.77/1.17  STDDEV: 4.79

This cluster is running ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299). At the time of the initial setup the version was Giant, with an upgrade to Hammer a short time after its release.

Actions #4

Updated by Sage Weil about 8 years ago

  • Status changed from Need More Info to Rejected

Nils, in your case, the result is correct.. your usage is bounded by osd.17, which looks like it will fill up after about 4.7T of new data is written.

Dan, I suspect you are seeing the same thing on your cluster. Closing this. Reopen if that's not the case (i.e., you have an 'ceph osd df' that shows no outlier osds)

Actions #5

Updated by Phat Le Ton over 7 years ago

Hi Sage Weil,
I see you so very clear about this case so that you can help me show more information ?
In this case:
1. how to calculate "MAX AVAIL" value ?
2. "MAX AVAIL" is Usable free Space on our cluster or just a estimated free space based on osd.17 if it's status if full ?
3. Which value we need consider to monitor 'free space' that impact our investment ?

Actions

Also available in: Atom PDF