Actions
Bug #9998
closedReplaced OSD weight below 0
% Done:
0%
Source:
Community (user)
Tags:
Backport:
firefly,giant
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I've hit a bug when replacing OSDs. Under specific conditions replaced OSD gets weight of -3.052e-05
.
How to reproduce¶
- build a cluster with few nodes
- remove OSD node with id lower than maximum (i.e. not the last one)
- change host weight while not changing OSD weights (i.e. sum(weight(osd)) != weight(host))
- add new osd (it should receive missing id and host weight will be updated)
Following part of the code looks to be responsible for this bug (src/crush/CrushWrapper.cc
):
if (item_exists(item)) { weight = get_item_weightf(item);
The value is -3.052e-05
is returned by get_item_weightf
when get_item_weight
returns -ENOENT
(-2):
float get_item_weightf(int id) { return (float)get_item_weight(id) / (float)0x10000; }
The worst thing is that such OSD receives most (almost all) of the data!¶
This looks like somewhere in the code int
is casted to unsigned int
and value just below zero become a huge weight in crushmap.
OSD tree:
host406249# ceph osd tree -2 4.4 host host406249 3 1.29 osd.3 up 1 4 1.29 osd.4 up 1 6 1.82 osd.6 up 1 1 -3.052e-05 osd.1 up 1 -5 6.22 rack C2 -4 6.22 host host406251 0 1.82 osd.0 up 1 2 1.29 osd.2 up 1 5 1.29 osd.5 up 1 7 1.82 osd.7 up 1 -7 6.22 rack C3 -6 6.22 host host406254 9 1.82 osd.9 up 1 10 1.29 osd.10 up 1 11 1.29 osd.11 up 1 8 1.82 osd.8 up 1
Host with "broken" OSD:
host406249# df -h /osd-sd{a..d} Filesystem Size Used Avail Use% Mounted on /dev/mapper/osd-sda 1.3T 88M 1.3T 1% /osd-sda /dev/mapper/osd-sdb 1.3T 55M 1.3T 1% /osd-sdb /dev/mapper/osd-sdc 1.9T 100M 1.9T 1% /osd-sdc /dev/mapper/osd-sdd 1.9T 10G 1.9T 1% /osd-sdd <------ this the osd.1 with negative weight
Normal host:
host406254# df -h /osd-sd{a..d} Filesystem Size Used Avail Use% Mounted on /dev/mapper/osd-sda 1.3T 2.4G 1.3T 1% /osd-sda /dev/mapper/osd-sdb 1.3T 2.0G 1.3T 1% /osd-sdb /dev/mapper/osd-sdc 1.9T 2.9G 1.9T 1% /osd-sdc /dev/mapper/osd-sdd 1.9T 3.1G 1.9T 1% /osd-sdd
Files
Actions