Bug #54347: ceph df stats break when there is an OSD with CRUSH weight == 0 - Ceph - Ceph

Actions

Copy link

Bug #54347

closed

ceph df stats break when there is an OSD with CRUSH weight == 0

Added by Ben Crisp about 2 years ago. Updated over 1 year ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

OSD is out:

{
  "osd": 7,
  "uuid": "eea11d98-f027-4b81-8894-32adea02fee0",
  "up": 1,
  "in": 0,
  "weight": 0,
  "primary_affinity": 1,
  "last_clean_begin": 140035,
  "last_clean_end": 215828,
  "up_from": 215829,
  "up_thru": 215824,
  "down_at": 215827,
  "lost_at": 0,
  "public_addrs": {
    "addrvec": [
      {
        "type": "v2",
        "addr": "172.19.208.1:6827",
        "nonce": 1926154309
      },
      {
        "type": "v1",
        "addr": "172.19.208.1:6832",
        "nonce": 1926154309
      }
    ]
  },
  "cluster_addrs": {
    "addrvec": [
      {
        "type": "v2",
        "addr": "172.19.208.1:6960",
        "nonce": 1927154309
      },
      {
        "type": "v1",
        "addr": "172.19.208.1:6961",
        "nonce": 1927154309
      }
    ]
  },
  "heartbeat_back_addrs": {
    "addrvec": [
      {
        "type": "v2",
        "addr": "172.19.208.1:6879",
        "nonce": 1926154309
      },
      {
        "type": "v1",
        "addr": "172.19.208.1:6883",
        "nonce": 1926154309
      }
    ]
  },
  "heartbeat_front_addrs": {
    "addrvec": [
      {
        "type": "v2",
        "addr": "172.19.208.1:6867",
        "nonce": 1926154309
      },
      {
        "type": "v1",
        "addr": "172.19.208.1:6873",
        "nonce": 1926154309
      }
    ]
  },
  "public_addr": "172.19.208.1:6832/1926154309",
  "cluster_addr": "172.19.208.1:6961/1927154309",
  "heartbeat_back_addr": "172.19.208.1:6883/1926154309",
  "heartbeat_front_addr": "172.19.208.1:6873/1926154309",
  "state": [
    "exists",
    "up" 
  ]
}

OSD has CRUSH weight set to 0:

{
  "id": 7,
  "device_class": "hdd",
  "name": "osd.7",
  "type": "osd",
  "type_id": 0,
  "crush_weight": 0,
  "depth": 5,
  "pool_weights": {},
  "reweight": 0,
  "kb": 0,
  "kb_used": 0,
  "kb_used_data": 0,
  "kb_used_omap": 0,
  "kb_used_meta": 0,
  "kb_avail": 0,
  "utilization": 0,
  "var": 0,
  "pgs": 0,
  "status": "up" 
}

`ceph df` looks like:

--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
hdd    2.7 PiB  925 TiB  1.8 PiB   1.8 PiB      66.21
TOTAL  2.7 PiB  925 TiB  1.8 PiB   1.8 PiB      66.21

--- POOLS ---
POOL                   ID  PGS   STORED   OBJECTS  USED     %USED  MAX AVAIL
device_health_metrics   1     1  2.7 GiB      864  8.2 GiB      0    233 TiB
qxdata                  2  8192  1.2 PiB    1.12G  1.6 PiB  70.66    524 TiB
test                    3   512  2.6 TiB   19.40M  3.4 TiB   0.49    524 TiB

STORED != USED

I set the OSD "in":

$ ceph osd in 7

And now `ceph df` looks like:

--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
hdd    2.7 PiB  929 TiB  1.8 PiB   1.8 PiB      66.12
TOTAL  2.7 PiB  929 TiB  1.8 PiB   1.8 PiB      66.12

--- POOLS ---
POOL                   ID  PGS   STORED   OBJECTS  USED     %USED  MAX AVAIL
device_health_metrics   1     1  2.2 GiB      864  2.2 GiB      0    233 TiB
qxdata                  2  8192  1.2 PiB    1.12G  1.2 PiB  64.13    524 TiB
test                    3   512  2.3 TiB   19.40M  2.3 TiB   0.33    524 TiB

Now, STORED == USED.

As a result, %USED drops. This is quite misleading, suggesting the pool has more space than it does in reality.

If I "out" the OSD again, it returns to normal.

I have seen this same behaviour in 2 different Ceph clusters. One running Nautilus and one Octopus. The cluster in this example is running Octopus 15.2.11.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Renaud Miel about 2 years ago

Same issue observed with:
"ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)"

but for an unknown and apparently different reason: all osds have crush_weight > 0.

In cepfh df output:
--- RAW STORAGE --- section: USED RAW USED
--- POOLS --- section : STORED USED

Actions

Copy link

Updated by Renaud Miel about 2 years ago

Equals signs were removed by Redmine in previous comment.

Same issue observed with:
"ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)" 

but for an unknown and apparently different reason: all osds have crush_weight > 0.

In cepfh df output:
--- RAW STORAGE --- section: USED == RAW USED
--- POOLS --- section : STORED == USED

Actions

Copy link

Updated by Renaud Miel about 2 years ago

Additional notes:

As usual, the problem occurs only in our production environment: 5 osd servers, 2 mds servers, Ubuntu 20.04.4 LTS, ceph 16.2.7, 1 cephfs, 819 TiB hdd (cephfs data pool), 29 TiB ssd (cephfs metadata pool), 183 osds, all bare-metal servers.
The problem does NOT occur in our test environment: 5 osd servers, 2 mds servers, Ubuntu 20.04.4 LTS, ceph 16.2.7, 1 cephfs, 50 GiB hdd, 5 osds, all VM servers

Actions

Copy link

Updated by Snow Si about 2 years ago

As "https://tracker.ceph.com/issues/48385" says, "STORED == USED" appears because "osd_sum.num_osds != osd_sum.num_per_pool_osds":

My best guess is it's related to this in PGMap.h:
      bool use_per_pool_stats() const {
        return osd_sum.num_osds == osd_sum.num_per_pool_osds;
      }

You can use "ceph df -f json" to get "osd_sum.num_osds" and "osd_sum.num_per_pool_osds"

Actions

Copy link

Updated by Snow Si about 2 years ago

https://tracker.ceph.com/issues/48385

Actions

Copy link

Updated by Renaud Miel almost 2 years ago

Thank you for your feedback Snow Si: this was helpful to workaround the "STORED == USED" issue in ceph df output.

You were right: we had osd_num.num_osds != osd_sum.num_per_pool_osds.
But this not our responsibility: ceph computed and found out that osd_num.num_osds != osd_sum.num_per_pool_osds.

It looks like this was because 1 osd of cephfs' metadata pool (replicated 3, 2.8GiB stored, 880k objects, 32 pgs backed by 33 ssds)
was not storing any pg despite it was in and up.

Stopping this osd allowed to workaround the issue: we now have as expected "STORED != USED" in ceph df output.

This raises a new question: do you have any idea why ceph failed to place data on this specific osd ?
Note: we have 5 ceph servers, 3 of them have each 11 ssds and 30 hdds. 2 of them have no ssds, only hdds.

Actions

Copy link