Bug #54347
closedceph df stats break when there is an OSD with CRUSH weight == 0
0%
Description
OSD is out:
{
"osd": 7,
"uuid": "eea11d98-f027-4b81-8894-32adea02fee0",
"up": 1,
"in": 0,
"weight": 0,
"primary_affinity": 1,
"last_clean_begin": 140035,
"last_clean_end": 215828,
"up_from": 215829,
"up_thru": 215824,
"down_at": 215827,
"lost_at": 0,
"public_addrs": {
"addrvec": [
{
"type": "v2",
"addr": "172.19.208.1:6827",
"nonce": 1926154309
},
{
"type": "v1",
"addr": "172.19.208.1:6832",
"nonce": 1926154309
}
]
},
"cluster_addrs": {
"addrvec": [
{
"type": "v2",
"addr": "172.19.208.1:6960",
"nonce": 1927154309
},
{
"type": "v1",
"addr": "172.19.208.1:6961",
"nonce": 1927154309
}
]
},
"heartbeat_back_addrs": {
"addrvec": [
{
"type": "v2",
"addr": "172.19.208.1:6879",
"nonce": 1926154309
},
{
"type": "v1",
"addr": "172.19.208.1:6883",
"nonce": 1926154309
}
]
},
"heartbeat_front_addrs": {
"addrvec": [
{
"type": "v2",
"addr": "172.19.208.1:6867",
"nonce": 1926154309
},
{
"type": "v1",
"addr": "172.19.208.1:6873",
"nonce": 1926154309
}
]
},
"public_addr": "172.19.208.1:6832/1926154309",
"cluster_addr": "172.19.208.1:6961/1927154309",
"heartbeat_back_addr": "172.19.208.1:6883/1926154309",
"heartbeat_front_addr": "172.19.208.1:6873/1926154309",
"state": [
"exists",
"up"
]
}
OSD has CRUSH weight set to 0:
{
"id": 7,
"device_class": "hdd",
"name": "osd.7",
"type": "osd",
"type_id": 0,
"crush_weight": 0,
"depth": 5,
"pool_weights": {},
"reweight": 0,
"kb": 0,
"kb_used": 0,
"kb_used_data": 0,
"kb_used_omap": 0,
"kb_used_meta": 0,
"kb_avail": 0,
"utilization": 0,
"var": 0,
"pgs": 0,
"status": "up"
}
`ceph df` looks like:
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 2.7 PiB 925 TiB 1.8 PiB 1.8 PiB 66.21
TOTAL 2.7 PiB 925 TiB 1.8 PiB 1.8 PiB 66.21
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1 2.7 GiB 864 8.2 GiB 0 233 TiB
qxdata 2 8192 1.2 PiB 1.12G 1.6 PiB 70.66 524 TiB
test 3 512 2.6 TiB 19.40M 3.4 TiB 0.49 524 TiB
STORED != USED
I set the OSD "in":
$ ceph osd in 7
And now `ceph df` looks like:
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 2.7 PiB 929 TiB 1.8 PiB 1.8 PiB 66.12
TOTAL 2.7 PiB 929 TiB 1.8 PiB 1.8 PiB 66.12
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1 2.2 GiB 864 2.2 GiB 0 233 TiB
qxdata 2 8192 1.2 PiB 1.12G 1.2 PiB 64.13 524 TiB
test 3 512 2.3 TiB 19.40M 2.3 TiB 0.33 524 TiB
Now, STORED == USED.
As a result, %USED drops. This is quite misleading, suggesting the pool has more space than it does in reality.
If I "out" the OSD again, it returns to normal.
I have seen this same behaviour in 2 different Ceph clusters. One running Nautilus and one Octopus. The cluster in this example is running Octopus 15.2.11.
Updated by Renaud Miel about 2 years ago
Same issue observed with:
"ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)"
but for an unknown and apparently different reason: all osds have crush_weight > 0.
In cepfh df output:
--- RAW STORAGE --- section: USED RAW USED
--- POOLS --- section : STORED USED
Updated by Renaud Miel about 2 years ago
Equals signs were removed by Redmine in previous comment.
Same issue observed with: "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)" but for an unknown and apparently different reason: all osds have crush_weight > 0. In cepfh df output: --- RAW STORAGE --- section: USED == RAW USED --- POOLS --- section : STORED == USED
Updated by Renaud Miel about 2 years ago
- As usual, the problem occurs only in our production environment: 5 osd servers, 2 mds servers, Ubuntu 20.04.4 LTS, ceph 16.2.7, 1 cephfs, 819 TiB hdd (cephfs data pool), 29 TiB ssd (cephfs metadata pool), 183 osds, all bare-metal servers.
- The problem does NOT occur in our test environment: 5 osd servers, 2 mds servers, Ubuntu 20.04.4 LTS, ceph 16.2.7, 1 cephfs, 50 GiB hdd, 5 osds, all VM servers
Updated by Snow Si about 2 years ago
As "https://tracker.ceph.com/issues/48385" says, "STORED == USED" appears because "osd_sum.num_osds != osd_sum.num_per_pool_osds":
My best guess is it's related to this in PGMap.h:
bool use_per_pool_stats() const {
return osd_sum.num_osds == osd_sum.num_per_pool_osds;
}
You can use "ceph df -f json" to get "osd_sum.num_osds" and "osd_sum.num_per_pool_osds"
Updated by Renaud Miel almost 2 years ago
Thank you for your feedback Snow Si: this was helpful to workaround the "STORED == USED" issue in ceph df output.
You were right: we had osd_num.num_osds != osd_sum.num_per_pool_osds.
But this not our responsibility: ceph computed and found out that osd_num.num_osds != osd_sum.num_per_pool_osds.
It looks like this was because 1 osd of cephfs' metadata pool (replicated 3, 2.8GiB stored, 880k objects, 32 pgs backed by 33 ssds)
was not storing any pg despite it was in and up.
Stopping this osd allowed to workaround the issue: we now have as expected "STORED != USED" in ceph df output.
This raises a new question: do you have any idea why ceph failed to place data on this specific osd ?
Note: we have 5 ceph servers, 3 of them have each 11 ssds and 30 hdds. 2 of them have no ssds, only hdds.
Updated by Igor Fedotov over 1 year ago
- Related to Bug #57121: STORE==USED in ceph df added