the %USED of "ceph df" is wrong
$ ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 1 0.00999 1.00000 10228M 9734M 494M 95.16 1.03 128 0 0.00999 1.00000 10228M 9734M 494M 95.16 1.03 128 2 0.00999 1.00000 10228M 9254M 974M 90.47 0.97 128 3 0.00999 1.00000 10228M 9254M 974M 90.47 0.97 128 TOTAL 40915M 37976M 2939M 92.82 $ ceph df POOLS: NAME ID USED %USED MAX AVAIL OBJECTS pool1 1 9216M 45.05 974M 9 pool2 2 9696M 47.39 494M 10
per the crush dump
pool1 uses the "default" rule, which in turn selects and osd.2, osd.3.
pool2 uses the "test_profile" rule, which in turn selects osd.0, osd.1.
both pools' size is 2.
USED reflects the nominal amount stored by the pool without accounting for overhead or replication (1000 MB of objects is 1000 MB of USED even if the actual on disk usage is much higher due to replication and overhead).
and "MAX AVAIL" reflects the free space available for storing objects in the assigned OSDs. in other words, if we do not put more data or remove existing data in other pools, we can store 974MB objects into pool1 in addition to existing objects.
USED is not a number close to 90, as we expect, because the OSDs assigned to pool1 are almost full. take pool1 for example, it's calculated using
(9216*2)/(10228*4) = 0.4505
- since the pool1.size is 2, we have two copies of the data, the raw space used by the 9216M objects is 2 * 9216M.
- the raw space offered by each OSD is 10228M. and the total raw space offered by the cluster is 4 * 10228M
the percentage is not a good indicator of how full this pool is, but just a ratio of "the raw space used by this pool" / "the raw space of the whole cluster", even not all OSDs in the cluster are assigned to this pool.
so indeed, we should use
USED / (USED + AVAIL)
to calc the %USED.