Project

General

Profile

Actions

Bug #41829

open

ceph df reports incorrect pool usage

Added by Dan Moraru over 4 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Since upgrading from v14.2.2 to v14.2.3, ceph df erroneously equates pool usage with the amount of data stored in the pool, i.e. the STORED and USED columns in the POOLS section are identical:

$ ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 1.3 PiB 1.1 PiB 218 TiB 218 TiB 16.79
mdd 2.0 TiB 2.0 TiB 2.1 GiB 10 GiB 0.49
ssd 6.8 TiB 6.8 TiB 4.4 GiB 47 GiB 0.68
TOTAL 1.3 PiB 1.1 PiB 218 TiB 218 TiB 16.68

POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
fs.metadata.archive 35 333 MiB 798 333 MiB 0.02 644 GiB
fs.data.archive 36 0 B 105.73k 0 B 0 2.1 TiB
fs.data.archive.frames 38 158 TiB 41.38M 158 TiB 14.02 725 TiB
fs.metadata.users 41 418 MiB 7.25k 418 MiB 0.02 644 GiB
fs.data.users 42 0 B 108.22k 0 B 0 2.1 TiB
fs.data.users.home 43 132 GiB 2.24M 132 GiB 0.01 725 TiB

Overall usage reported in the RAW STORAGE section is correct and matches the total across OSDs:

$ ceph osd df | grep TOTAL
TOTAL 1.3 PiB 218 TiB 217 TiB 866 MiB 520 GiB 1.1 PiB 16.68

Of the above 6 pools, four are triply-replicated and two are 6+2 erasure-coded:

$ ceph osd dump | grep pool
pool 35 'fs.metadata.archive' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 8677 flags hashpspool stripe_width 0 target_size_ratio 0.25 application cephfs
pool 36 'fs.data.archive' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 8677 flags hashpspool stripe_width 0 compression_algorithm lz4 compression_mode aggressive target_size_ratio 0.5 application cephfs
pool 38 'fs.data.archive.frames' erasure size 8 min_size 7 crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 2048 autoscale_mode warn last_change 15874 lfor 0/0/15670 flags hashpspool,ec_overwrites stripe_width 393216 compression_algorithm lz4 compression_mode aggressive target_size_ratio 0.5 application cephfs
pool 41 'fs.metadata.users' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 17120 flags hashpspool stripe_width 0 target_size_ratio 0.25 application cephfs
pool 42 'fs.data.users' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 17121 flags hashpspool stripe_width 0 compression_algorithm lz4 compression_mode aggressive target_size_ratio 0.5 application cephfs
pool 43 'fs.data.users.home' erasure size 8 min_size 7 crush_rule 4 object_hash rjenkins pg_num 2048 pgp_num 2048 autoscale_mode warn last_change 17122 lfor 0/0/16895 flags hashpspool,ec_overwrites stripe_width 393216 compression_algorithm lz4 compression_mode aggressive target_size_ratio 0.5 application cephfs


Files

strace16.err (262 KB) strace16.err Dan Moraru, 03/14/2020 06:35 PM
strace17.err (262 KB) strace17.err Dan Moraru, 03/14/2020 06:35 PM

Related issues 3 (1 open2 closed)

Related to Dashboard - Bug #45185: mgr/dashboard: fix usage calculation to match "ceph df" wayResolvedErnesto Puerta

Actions
Related to Dashboard - Feature #38697: mgr/dashboard: Enhance info shown in Landing Page cards 'PGs per OSD' & 'Raw Capacity'Closed

Actions
Is duplicate of mgr - Bug #40203: ceph df shows incorrect usageNew06/07/2019

Actions
Actions #1

Updated by Greg Farnum over 4 years ago

  • Project changed from Ceph to mgr

Updated by Dan Moraru about 4 years ago

This problem went away as I created additional pools, but has now resurfaced in v14.2.8. The output of 'ceph df' alternates between sometimes being correct (roughly 10% of the time):

[root@ceph ~]# cat strace17.out
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 1.3 PiB 252 TiB 1.1 PiB 1.1 PiB 81.43
mdd 2.0 TiB 2.0 TiB 21 GiB 30 GiB 1.47
ssd 257 TiB 255 TiB 1.9 TiB 2.3 TiB 0.91
TOTAL 1.6 PiB 509 TiB 1.1 PiB 1.1 PiB 68.54

POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
fs.metadata.archive 35 383 MiB 4.22k 646 MiB 0.03 637 GiB
fs.data.archive 36 0 B 566.29k 0 B 0 81 TiB
fs.data.archive.frames 38 835 TiB 211.04M 1.1 PiB 88.35 109 TiB
fs.metadata.scratch 44 1.1 GiB 1.47k 1.7 GiB 0.09 637 GiB
fs.data.scratch 45 0 B 2.07M 0 B 0 81 TiB
fs.data.scratch.llcache 46 814 GiB 2.08M 1019 GiB 0.41 181 TiB
fs.data.scratch.frames 47 0 B 0 0 B 0 181 TiB
fs.metadata.home 48 287 MiB 70 649 MiB 0.03 637 GiB
fs.data.home 49 2 KiB 2 48 KiB 0 81 TiB
fs.data.home.user 50 384 KiB 1 288 KiB 0 181 TiB

but mostly incorrect:

[root@ceph ~]# cat strace16.out
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 1.3 PiB 263 TiB 1.1 PiB 1.1 PiB 80.78
mdd 2.0 TiB 2.0 TiB 21 GiB 30 GiB 1.47
ssd 257 TiB 255 TiB 1.9 TiB 2.3 TiB 0.91
TOTAL 1.6 PiB 520 TiB 1.1 PiB 1.1 PiB 68.08

POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
fs.metadata.archive 35 383 MiB 4.22k 383 MiB 0.02 637 GiB
fs.data.archive 36 0 B 566.29k 0 B 0 81 TiB
fs.data.archive.frames 38 804 TiB 211.04M 804 TiB 84.68 109 TiB
fs.metadata.scratch 44 1.1 GiB 1.47k 1.1 GiB 0.06 637 GiB
fs.data.scratch 45 0 B 2.07M 0 B 0 81 TiB
fs.data.scratch.llcache 46 667 GiB 2.08M 667 GiB 0.27 181 TiB
fs.data.scratch.frames 47 0 B 0 0 B 0 181 TiB
fs.metadata.home 48 287 MiB 70 287 MiB 0.01 637 GiB
fs.data.home 49 2 KiB 2 2 KiB 0 81 TiB
fs.data.home.user 50 2 B 1 2 B 0 181 TiB

Note there are discrepancies even in the STORED column, with Ceph sometimes indicating that 384KiB are stored in the fs.data.home.user pool (a 6+2 EC pool), but generally reporting the 2-byte size of the solitary file currently in that pool. I suppose "2 B" is the correct STORED value in this case, but I am fairly certain that file is USING more than 2 bytes of the pool. In fact, I am trying to empirically determine just how much data and metadata space files of different sizes take up, and this is seriously hampering that effort. Some of the above pools are triply-replicated, others are erasure-coded:

[root@ceph ~]# ceph osd pool ls detail
pool 35 'fs.metadata.archive' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 52155 flags hashpspool stripe_width 0 target_size_ratio 0.1 application cephfs
pool 36 'fs.data.archive' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 47695 flags hashpspool stripe_width 0 compression_algorithm lz4 compression_mode aggressive target_size_ratio 0.01 application cephfs
pool 38 'fs.data.archive.frames' erasure size 8 min_size 7 crush_rule 2 object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 141214 lfor 0/0/80364 flags hashpspool,ec_overwrites stripe_width 393216 compression_algorithm lz4 compression_mode aggressive target_size_ratio 1 application cephfs
pool 44 'fs.metadata.scratch' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 58512 lfor 0/58512/58510 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 target_size_ratio 0.05 application cephfs
pool 45 'fs.data.scratch' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 47654 flags hashpspool stripe_width 0 compression_algorithm lz4 compression_mode aggressive target_size_ratio 0.01 application cephfs
pool 46 'fs.data.scratch.llcache' erasure size 8 min_size 7 crush_rule 4 object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn last_change 93444 lfor 0/80331/80329 flags hashpspool,ec_overwrites stripe_width 393216 compression_algorithm lz4 compression_mode aggressive target_size_ratio 0.05 application cephfs
pool 47 'fs.data.scratch.frames' erasure size 8 min_size 7 crush_rule 5 object_hash rjenkins pg_num 8192 pgp_num 8192 autoscale_mode warn last_change 120429 lfor 0/0/92563 flags hashpspool,ec_overwrites stripe_width 393216 application cephfs
pool 48 'fs.metadata.home' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 130182 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
pool 49 'fs.data.home' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 130180 flags hashpspool stripe_width 0 compression_algorithm lz4 compression_mode aggressive application cephfs
pool 50 'fs.data.home.user' erasure size 8 min_size 7 crush_rule 6 object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode warn last_change 130181 flags hashpspool,ec_overwrites stripe_width 393216 compression_algorithm lz4 compression_mode aggressive application cephfs

strace16.out and strace17.out shown above are what 'strace ceph df' displayed on STDOUT. The corresponding STDERR streams are attached.

Actions #3

Updated by Stephan Müller over 3 years ago

  • Related to Bug #42982: Monitoring: alert for "pool full" wrong added
Actions #4

Updated by Stephan Müller over 3 years ago

  • Is duplicate of Bug #40203: ceph df shows incorrect usage added
Actions #5

Updated by Stephan Müller over 3 years ago

  • Related to deleted (Bug #42982: Monitoring: alert for "pool full" wrong)
Actions #6

Updated by Stephan Müller over 3 years ago

  • Related to Bug #45185: mgr/dashboard: fix usage calculation to match "ceph df" way added
Actions #7

Updated by Stephan Müller over 3 years ago

  • Related to Feature #38697: mgr/dashboard: Enhance info shown in Landing Page cards 'PGs per OSD' & 'Raw Capacity' added
Actions #8

Updated by Dan Moraru over 3 years ago

ceph df is again reporting incorrect disk usage after upgrading from Nautilus to Octopus:

  1. ceph df
    --- RAW STORAGE ---
    CLASS SIZE AVAIL USED RAW USED %RAW USED
    hdd 1.7 PiB 777 TiB 977 TiB 978 TiB 55.72
    mdd 2.0 TiB 1.6 TiB 400 GiB 410 GiB 20.02
    ssd 258 TiB 237 TiB 20 TiB 21 TiB 7.99
    TOTAL 2.0 PiB 1016 TiB 997 TiB 999 TiB 49.58

--- POOLS ---
POOL ID STORED OBJECTS USED %USED MAX AVAIL
fs.metadata.archive 35 390 MiB 5.14k 390 MiB 0.02 511 GiB
fs.data.archive 36 0 B 572.80k 0 B 0 75 TiB
fs.data.archive.frames 38 714 TiB 187.51M 714 TiB 84.46 99 TiB
fs.metadata.scratch 44 1.1 GiB 1.39k 1.1 GiB 0.07 511 GiB
fs.data.scratch 45 0 B 1.99M 0 B 0 75 TiB
fs.data.scratch.llcache 46 342 GiB 2.00M 342 GiB 0.15 168 TiB
fs.data.scratch.frames 47 0 B 0 0 B 0 168 TiB
rbd 58 19 B 3 19 B 0 75 TiB
device_health_metrics 64 104 MiB 595 104 MiB 0 50 TiB
fs.metadata.cache 65 256 MiB 182 256 MiB 0.02 511 GiB
fs.data.cache 66 0 B 94 0 B 0 75 TiB
fs.data.cache.frames 67 3.0 TiB 783.76k 3.0 TiB 2.21 99 TiB

Some of the pools are triply-replicated, others are 6+2 EC, yet in all cases 'ceph df' reports STORED=USED. In the past, this went away as new pools were created and/or deleted, but I hesitate to delete pools at this point due to this critical bug:
https://tracker.ceph.com/issues/47182

A related thread recently popped up on the users' mailing list:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/5XUMIMBE2LWSLQSPTKPPQPAUH4F4BPYU/

I ask that the severity of this and related bugs be elevated to at least major. This bug is symptomatic of a serious error in tracking pool usage that has widespread ramifications, beyond dashboard. For example, pools are seemingly unaware of the space truly available to them. The fs.data.cache.frames pool is a newly-created EC 6+2 pool with the following crush rule:

  1. ceph osd crush rule dump fs.data.cache.frames {
    "rule_id": 8,
    "rule_name": "fs.data.cache.frames",
    "ruleset": 8,
    "type": 3,
    "min_size": 3,
    "max_size": 8,
    "steps": [ {
    "op": "set_chooseleaf_tries",
    "num": 5
    }, {
    "op": "set_choose_tries",
    "num": 100
    }, {
    "op": "take",
    "item": -24,
    "item_name": "default~hdd"
    }, {
    "op": "chooseleaf_indep",
    "num": 0,
    "type": "chassis"
    }, {
    "op": "emit"
    }
    ]
    }

While there are 777TiB of raw HDD storage available, ceph claims that fs.data.archive.frames can accommodate at most 99TiB:

POOL ID STORED OBJECTS USED %USED MAX AVAIL
...
fs.data.cache.frames 67 2.8 TiB 740.46k 2.8 TiB 2.08 99 TiB

and moreover, this pool that is presently 2.08% used is somehow nearfull:

  1. ceph health detail
    ...
    pool 'fs.data.archive.frames' is nearfull

The autoscaler is equally confused, suggesting that the PG numbers for metadata pools be drastically increased from 64 to 16384 or 32768, while those for some of the data pools be reduced:

  1. ceph osd pool autoscale-status
    POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
    fs.metadata.archive 390.3M 3.0 2014T 0.0000 0.1000 0.6667 1.0 64 16384 warn
    fs.data.archive 0 3.0 2014T 0.0000 0.0100 0.0667 1.0 256 1024 warn
    fs.data.archive.frames 714.2T 1.3333333730697632 2014T 0.4727 1.0000 6.6667 1.0 4096 warn
    fs.metadata.scratch 1123M 3.0 2014T 0.0000 0.0500 0.3333 4.0 64 32768 warn
    fs.data.scratch 0 3.0 2014T 0.0000 0.0100 0.0667 1.0 256 1024 warn
    fs.data.scratch.llcache 342.0G 1.3333333730697632 2014T 0.0002 0.0500 0.3333 1.0 1024 warn
    fs.data.scratch.frames 0 1.3333333730697632 2014T 0.0000 1.0 8192 32 warn
    rbd 19 3.0 2014T 0.0000 1.0 128 32 warn
    device_health_metrics 103.6M 3.0 2014T 0.0000 1.0 1 warn
    fs.metadata.cache 257.3M 3.0 2014T 0.0000 4.0 64 16 warn
    fs.data.cache 0 3.0 2014T 0.0000 1.0 256 32 warn
    fs.data.cache.frames 3104G 1.3333333730697632 2014T 0.0020 1.0 4096 32 warn
Actions #9

Updated by Dan Moraru over 3 years ago

I copied-and-pasted the wrong line above. Ceph is indeed claiming that fs.data.cache.frames is one of three nearfull pools:

[WRN] POOL_NEARFULL: 3 pool(s) nearfull
pool 'fs.data.archive.frames' is nearfull
pool 'device_health_metrics' is nearfull
pool 'fs.data.cache.frames' is nearfull

The device_health_metrics pool appeared after upgrading to Octopus and is not one I created.

Actions

Also available in: Atom PDF