Bug #13844
closedceph df MAX AVAIL is incorrect for simple replicated pool
0%
Description
Since we upgraded from firefly to hammer the MAX AVAIL column is quite wrong for most of our pools.
GLOBAL: SIZE AVAIL RAW USED %RAW USED 3564T 2544T 1020T 28.61
Example pool:
POOLS: NAME ID USED %USED MAX AVAIL OBJECTS volumes 4 255T 7.17 272T 67556416
By my calculations MAX AVAIL should be 2184T/3 = 728T.
pool 4 'volumes' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 296326 min_read_recency_for_promote 1 stripe_width 0
crush_ruleset 0:
{ "rule_id": 0, "rule_name": "data", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -2, "item_name": "0513-R-0050" }, { "op": "chooseleaf_firstn", "num": 0, "type": "rack" }, { "op": "emit" } ] },
The osd tree:
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -92 0 root drain -18 218.39999 root os -22 218.39999 room 0513-R-0050-os -30 54.59999 rack RJ55-os -34 54.59999 host p05151113748698-os -44 54.59999 rack RJ47-os -26 54.59999 host p05151113613837-os -62 54.59999 rack RJ45-os -50 54.59999 host p05151113587529-os -64 54.59999 rack RJ41-os -63 54.59999 host p05151113561997-os -1 3349.22803 root default -2 2184.42822 room 0513-R-0050 -3 163.80000 rack RJ35 -15 54.59999 host p05151113471870 -16 54.59999 host p05151113489275 -17 54.59999 host p05151113479552 -4 163.80000 rack RJ37 -23 54.59999 host p05151113507373 -24 54.59999 host p05151113508409 -25 54.59999 host p05151113521447 -5 163.80000 rack RJ39 -19 54.59999 host p05151113538756 -20 54.59999 host p05151113535271 -21 54.59999 host p05151113534235 -6 163.80000 rack RJ41 -27 54.59999 host p05151113558761 -28 54.59999 host p05151113544113 -29 54.59999 host p05151113551146 -7 163.79990 rack RJ43 -31 54.59999 host p05151113573587 -32 54.59999 host p05151113578124 -33 54.59991 host p05151113568206 -8 163.79990 rack RJ45 -35 54.59999 host p05151113578807 -41 54.59999 host p05151113585107 -47 54.59991 host p05151113590997 -9 163.80000 rack RJ47 -38 54.59999 host p05151113599377 -39 54.59999 host p05151113598352 -40 54.59999 host p05151113619324 -10 219.30989 rack RJ49 -36 55.50992 host p05151113636272 -43 54.59999 host p05151113640230 -48 54.59999 host p05151113642826 -49 54.59999 host p05151113633314 -11 218.39981 rack RJ51 -37 54.59991 host p05151113676458 -42 54.59999 host p05151113674062 -45 54.59991 host p05151113669359 -46 54.59999 host p05151113654107 -12 219.30989 rack RJ53 -53 54.59999 host p05151113723693 -54 54.59999 host p05151113706163 -56 54.59999 host p05151113719408 -58 55.50992 host p05151113677609 -13 163.20972 rack RJ55 -59 54.39999 host p05151113760120 -60 54.40973 host p05151113725483 -61 54.39999 host p05151113751590 -14 217.59900 rack RJ57 -51 54.39999 host p05151113781242 -52 54.39999 host p05151113782262 -55 54.39999 host p05151113778539 -57 54.39999 host p05151113777233 -65 1164.79980 room 0513-R-0060 -71 582.39990 ipservice S513-A-IP37 -70 291.19995 rack BA09 -69 72.79999 host p05798818a82857 -73 72.79999 host p05798818b00047 -83 72.79999 host p05798818b00174 -86 72.79999 host p05798818b04322 -80 291.19995 rack BA10 -79 72.79999 host p05798818v47100 -88 72.79999 host p05798818v64334 -90 72.79999 host p05798818w03166 -91 72.79999 host p05798818v51559 -76 582.39990 ipservice S513-A-IP62 -75 291.19995 rack BA11 -74 72.79999 host p05798818s98313 -84 72.79999 host p05798818s63747 -85 72.79999 host p05798818s49204 -89 72.79999 host p05798818s40185 -78 291.19995 rack BA12 -77 72.79999 host p05798818b12431 -81 72.79999 host p05798818b37327 -82 72.79999 host p05798818b78429 -87 72.79999 host p05798818b40951
Updated by Sage Weil over 8 years ago
- Status changed from New to Need More Info
- Source changed from other to Community (dev)
This is the responsible code:
https://github.com/ceph/ceph/blob/master/src/mon/PGMonitor.cc#L1357
My guess is that this isn't a bug, but a single OSD with skewed placement, and the mon is correctly predicting that after writing only ~300TB that one OSD will fill up. Is that possible? (ceph osd df might help identify any outliers.)
Updated by Nils Meyer over 8 years ago
I'm seeing a similar issue on my small cluster:
root@ceph-mon1:/# /usr/bin/ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 33517G 16983G 16534G 49.33 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 0 4662G 13.91 4720G 1610970 root@hv-p-host1:/# ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 18 1.81999 1.00000 1862G 862G 1000G 46.29 0.94 96 0 1.81999 1.00000 1862G 917G 944G 49.26 1.00 95 2 1.81999 1.00000 1862G 898G 963G 48.24 0.98 99 4 1.81999 1.00000 1862G 1008G 853G 54.16 1.10 111 6 1.81999 1.00000 1862G 914G 947G 49.11 1.00 102 7 1.81999 1.00000 1862G 910G 951G 48.90 0.99 97 10 1.81999 1.00000 1862G 822G 1039G 44.15 0.90 95 1 1.81999 1.00000 1862G 901G 960G 48.40 0.98 95 5 1.81999 1.00000 1862G 1045G 816G 56.17 1.14 111 3 1.81999 1.00000 1862G 709G 1152G 38.10 0.77 78 8 1.81999 1.00000 1862G 994G 867G 53.40 1.08 112 9 1.81999 1.00000 1862G 1038G 823G 55.77 1.13 109 12 1.81999 1.00000 1862G 875G 986G 47.02 0.95 96 13 1.81999 1.00000 1862G 850G 1011G 45.66 0.93 92 14 1.81999 1.00000 1862G 832G 1030G 44.69 0.91 91 15 1.81999 1.00000 1862G 973G 888G 52.28 1.06 108 16 1.81999 1.00000 1862G 904G 957G 48.58 0.98 97 17 1.81999 1.00000 1862G 1075G 786G 57.74 1.17 116 TOTAL 33517G 16534G 16983G 49.33 MIN/MAX VAR: 0.77/1.17 STDDEV: 4.79
This cluster is running ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299). At the time of the initial setup the version was Giant, with an upgrade to Hammer a short time after its release.
Updated by Sage Weil about 8 years ago
- Status changed from Need More Info to Rejected
Nils, in your case, the result is correct.. your usage is bounded by osd.17, which looks like it will fill up after about 4.7T of new data is written.
Dan, I suspect you are seeing the same thing on your cluster. Closing this. Reopen if that's not the case (i.e., you have an 'ceph osd df' that shows no outlier osds)
Updated by Phat Le Ton over 7 years ago
Hi Sage Weil,
I see you so very clear about this case so that you can help me show more information ?
In this case:
1. how to calculate "MAX AVAIL" value ?
2. "MAX AVAIL" is Usable free Space on our cluster or just a estimated free space based on osd.17 if it's status if full ?
3. Which value we need consider to monitor 'free space' that impact our investment ?