Project

General

Profile

Bug #58664

The per pool 'STORED' metric of ceph df is wrong when some osd are down.

Added by Alexandre Skrzyniarz about 1 year ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi.

I've seen this issue with ceph 16.2.11, shipped from Debian repositories.

I built a small test cluster composed of 3 (virtual) machines with 2 osds each. I assembled a cephfs and wrote 1 MiB of data on a replicated pool:

nominal behaviour

ceph df output:

--- RAW STORAGE ---
CLASS    SIZE   AVAIL    USED  RAW USED  %RAW USED
hdd    36 GiB  36 GiB  61 MiB    61 MiB       0.17
TOTAL  36 GiB  36 GiB  61 MiB    61 MiB       0.17

--- POOLS ---
POOL                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
device_health_metrics   1    1      0 B        0      0 B      0     11 GiB
cephfs_meta            11   32   17 KiB       23  132 KiB      0     17 GiB
cephfs_rep_0           12   32    1 MiB        1    3 MiB      0     17 GiB

ceph osd tree output (for information):

ID  CLASS  WEIGHT   TYPE NAME              STATUS  REWEIGHT  PRI-AFF
-1         0.03534  root default                                    
-3         0.01178      host deb11-ceph-1                           
 0    hdd  0.00589          osd.0              up   1.00000  1.00000
 4    hdd  0.00589          osd.4              up   1.00000  1.00000
-7         0.01178      host deb11-ceph-2                           
 1    hdd  0.00589          osd.1              up   1.00000  1.00000
 3    hdd  0.00589          osd.3              up   1.00000  1.00000
-5         0.01178      host deb11-ceph-3                           
 2    hdd  0.00589          osd.2              up   1.00000  1.00000
 5    hdd  0.00589          osd.5              up   1.00000  1.00000

ceph -s output (for information):

  cluster:
    id:     5edbfb48-7070-4d0a-a240-e000364453c2
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum sto-ceph-1,sto-ceph-2,sto-ceph-3 (age 43s)
    mgr: deb11-ceph-3(active, since 64m), standbys: deb11-ceph-1, deb11-ceph-2
    mds: 1/1 daemons up, 2 standby
    osd: 6 osds: 6 up (since 40s), 6 in (since 64m)

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 65 pgs
    objects: 24 objects, 1.0 MiB
    usage:   62 MiB used, 36 GiB / 36 GiB avail
    pgs:     65 active+clean

Faulty behaviour

Then, if I shutdown one serveur, the STORED value increase.

ceph df output:

--- RAW STORAGE ---
CLASS    SIZE   AVAIL    USED  RAW USED  %RAW USED
hdd    36 GiB  36 GiB  61 MiB    61 MiB       0.17
TOTAL  36 GiB  36 GiB  61 MiB    61 MiB       0.17

--- POOLS ---
POOL                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
device_health_metrics   1    1      0 B        0      0 B      0     11 GiB
cephfs_meta            11   32   17 KiB       23  132 KiB      0     17 GiB
cephfs_rep_0           12   32  1.5 MiB        1    3 MiB      0     17 GiB

STORED value for pool cephfs_rep_0 is wrong here.

ceph osd tree output (for information):

ID  CLASS  WEIGHT   TYPE NAME              STATUS  REWEIGHT  PRI-AFF
-1         0.03534  root default                                    
-3         0.01178      host deb11-ceph-1                           
 0    hdd  0.00589          osd.0              up   1.00000  1.00000
 4    hdd  0.00589          osd.4              up   1.00000  1.00000
-7         0.01178      host deb11-ceph-2                           
 1    hdd  0.00589          osd.1            down   1.00000  1.00000
 3    hdd  0.00589          osd.3            down   1.00000  1.00000
-5         0.01178      host deb11-ceph-3                           
 2    hdd  0.00589          osd.2              up   1.00000  1.00000
 5    hdd  0.00589          osd.5              up   1.00000  1.00000

ceph -s output (for information):

  cluster:
    id:     5edbfb48-7070-4d0a-a240-e000364453c2
    health: HEALTH_WARN
            1/3 mons down, quorum sto-ceph-1,sto-ceph-3
            2 osds down
            1 host (2 osds) down
            Degraded data redundancy: 24/72 objects degraded (33.333%), 15 pgs degraded, 65 pgs undersized

  services:
    mon: 3 daemons, quorum sto-ceph-1,sto-ceph-3 (age 2m), out of quorum: sto-ceph-2
    mgr: deb11-ceph-3(active, since 68m), standbys: deb11-ceph-1
    mds: 1/1 daemons up, 1 standby
    osd: 6 osds: 4 up (since 2m), 6 in (since 68m)

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 65 pgs
    objects: 24 objects, 1.0 MiB
    usage:   63 MiB used, 36 GiB / 36 GiB avail
    pgs:     24/72 objects degraded (33.333%)
             50 active+undersized

Analysis

As I understand it, the STORED value is the actual amount of data stored by the user in the pool, regardless of replication, erasure coding, or osd failures. It should not be adjusted if an osd fails.

I guess the issue stems from src/mon/PGMap.cc file, method PGMapDigest::dump_object_stat_sum.

Basically, raw_used_rate (used for STORED calculation) is adjusted with this code fragment:

if (sum.num_object_copies > 0) {
    raw_used_rate *= (float)(sum.num_object_copies - sum.num_objects_degraded) / sum.num_object_copies;
  }

Maybe it shouldn't?

HOW TO REPRODUCE:

- install a 3 nodes cluster with two osds.
- create a simple cephfs (default is size=3 for pools)

ceph osd pool create cephfs_meta 32 replicated
ceph osd pool create cephfs_rep_0 32 replicated
ceph fs new cephfs cephfs_meta cephfs_rep_0
mount -t ceph :/ /mnt/ceph -o name=admin
cd /mnt/ceph && dd if=/dev/zero of=file-1m count=1024 bs=1024

Then:

- look at ceph df output
- shutdown a server
- wait for df stats to be updated (may take a few minutes)
- look at ceph df output again

History

#1 Updated by Ilya Dryomov about 1 year ago

  • Target version deleted (v16.2.11)

Also available in: Atom PDF