Bug #40203: ceph df shows incorrect usage - mgr - Ceph

Actions

Copy link

Bug #40203

open

ceph df shows incorrect usage

Added by Momcilo Medic almost 5 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

ceph-mgr

Target version:

Ceph - v14.2.1

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

TL;DR

"ceph df" shows pool "itaf" as 1.2TiB used, while "rbd du -p itaf" shows total used as 19TiB
What feedback do I need to provide to help with resolving this?

Longer version:

While trying to resize our placement groups, we noticed sudden significant drop in both pool usage and capacity.
We use LibreNMS as a graphing tool, and we suspected that it might be monitoring misbehaving.
However, when further discussed on IRC, we came to conclusion that Ceph itself has two views on the usage.

Command ceph df shows the same value as monitoring observes.
Autoscaler now suggests 16 PGs, while previously value of 256 was suggested.

We didn't had any actions at that time as far as we can track it.
We experience no issues for the virtualization that is running on top of Ceph.

This surfaced as our goal was to enable suggestions for PG from autoscaler, but we first needed to resize all pools.
We are now reluctant to have any significant actions (especially as autoscaler would still throw warnings).

It is hard to guess which information is relevant so I'll just provide filtered output of few commands:

# rbd du -p itaf
NAME                    PROVISIONED USED     
...
<TOTAL>                      29 TiB   19 TiB 
#

# ceph df
...
    POOL        ID     STORED      OBJECTS     USED        %USED     MAX AVAIL 
    itaf         1     413 GiB       4.91M     1.2 TiB      0.50        79 TiB 
...
#

# ceph osd pool autoscale-status
 POOL       SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE 
...
 itaf      1223G                3.0        332.9T  0.0108                 1.0    2048          16  warn      
#

I am also attaching graph from LibreNMS of exact moment usage and capacity dropped.
Please let me know what information is relevant and I'll make sure to provide it.

Kind regards,
Momo.

Files

Screenshot at 2019-06-07 11_36_50.png (513 KB) Screenshot at 2019-06-07 11_36_50.png

Momcilo Medic, 06/07/2019 01:05 PM

Related issues 3 (1 open — 2 closed)