Actions
Bug #65591
openPool MAX_AVAIL goes UP when an OSD is marked down+in
Status:
New
Priority:
Normal
Assignee:
Category:
Administration/Usability
Target version:
% Done:
0%
Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
pgmap
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Example:
- Cluster with 4 OSD nodes, 10 OSDs each
- 3x replicated pool
- `max_avail` from `ceph df detail --format=json` output with all OSDs `up+in`: 72158076207104
- `max_avail` with 1 OSD node (10 OSDs) `down+in`: 96042674552832
1. The `raw_used_rate` is passed in to function `PGMapDigest::dump_object_stat_sum`, which should be equal to: * the number of copies for replicated pools * or ( K + M / K ) for EC pools.
2. At: https://github.com/ceph/ceph/blob/main/src/mon/PGMap.cc#L886
raw_used_rate *= (float)(sum.num_object_copies - sum.num_objects_degraded) / sum.num_object_copies;
- This applies a scaling factor equal to the percentage of non-degraded object copies ( when compared to the total object copies count ).
- Using the 'all_down' pgdump for libvirt-pool:
num_object_copies: 7287540 num_objects_degraded: 1812426 Scaling factor applied: 0.7512979688619205
3. The 'MAX_AVAIL' value is calculated at: https://github.com/ceph/ceph/blob/main/src/mon/PGMap.cc#L901
auto avail_res = raw_used_rate ? avail / raw_used_rate : 0;
- 'avail' is the raw available bytes
- 'raw_used_rate' is now ~75% of what it was, thus the 'MAX_AVAIL' increases
avail: ( min(osd_avail_kbytes) * num_osds ) - ( sum(osd_max_kbytes) * ( 1 - mon_osd_full_ratio )) (5597462260 * 40) - ( 250048839680 * ( 1 - 0.95 )) = 211396048416 raw_used_rate: 3 * 0.7512979688619205 = 2.253893907 max_avail: 211396048416 / 2.253893907 * 1024 = 96042476953095
Actions