Bug #9998
closedReplaced OSD weight below 0
0%
Description
I've hit a bug when replacing OSDs. Under specific conditions replaced OSD gets weight of -3.052e-05
.
How to reproduce¶
- build a cluster with few nodes
- remove OSD node with id lower than maximum (i.e. not the last one)
- change host weight while not changing OSD weights (i.e. sum(weight(osd)) != weight(host))
- add new osd (it should receive missing id and host weight will be updated)
Following part of the code looks to be responsible for this bug (src/crush/CrushWrapper.cc
):
if (item_exists(item)) { weight = get_item_weightf(item);
The value is -3.052e-05
is returned by get_item_weightf
when get_item_weight
returns -ENOENT
(-2):
float get_item_weightf(int id) { return (float)get_item_weight(id) / (float)0x10000; }
The worst thing is that such OSD receives most (almost all) of the data!¶
This looks like somewhere in the code int
is casted to unsigned int
and value just below zero become a huge weight in crushmap.
OSD tree:
host406249# ceph osd tree -2 4.4 host host406249 3 1.29 osd.3 up 1 4 1.29 osd.4 up 1 6 1.82 osd.6 up 1 1 -3.052e-05 osd.1 up 1 -5 6.22 rack C2 -4 6.22 host host406251 0 1.82 osd.0 up 1 2 1.29 osd.2 up 1 5 1.29 osd.5 up 1 7 1.82 osd.7 up 1 -7 6.22 rack C3 -6 6.22 host host406254 9 1.82 osd.9 up 1 10 1.29 osd.10 up 1 11 1.29 osd.11 up 1 8 1.82 osd.8 up 1
Host with "broken" OSD:
host406249# df -h /osd-sd{a..d} Filesystem Size Used Avail Use% Mounted on /dev/mapper/osd-sda 1.3T 88M 1.3T 1% /osd-sda /dev/mapper/osd-sdb 1.3T 55M 1.3T 1% /osd-sdb /dev/mapper/osd-sdc 1.9T 100M 1.9T 1% /osd-sdc /dev/mapper/osd-sdd 1.9T 10G 1.9T 1% /osd-sdd <------ this the osd.1 with negative weight
Normal host:
host406254# df -h /osd-sd{a..d} Filesystem Size Used Avail Use% Mounted on /dev/mapper/osd-sda 1.3T 2.4G 1.3T 1% /osd-sda /dev/mapper/osd-sdb 1.3T 2.0G 1.3T 1% /osd-sdb /dev/mapper/osd-sdc 1.9T 2.9G 1.9T 1% /osd-sdc /dev/mapper/osd-sdd 1.9T 3.1G 1.9T 1% /osd-sdd
Files
Updated by Pawel Sadowski over 9 years ago
This bug might be related to this part of code (use of one variable in two nested loops):
bool CrushWrapper::_search_item_exists(int item) const { for (int i = 0; i < crush->max_buckets; i++) { if (!crush->buckets[i]) continue; crush_bucket *b = crush->buckets[i]; for (unsigned i=0; i<b->size; ++i) { if (b->items[i] == item) return true; } } return false; }
Updated by Dan van der Ster over 9 years ago
We see this often in our dumpling cluster. Kinda annoying.
Updated by Sage Weil over 9 years ago
- Status changed from New to Need More Info
- Assignee set to Sage Weil
- Source changed from other to Community (user)
Can you clarify how you are doing this?
"change host weight while not changing OSD weights (i.e. sum(weight(osd)) != weight(host))"
Are you manually modifying the CRUSH map or using the CLI?
Updated by Sage Weil over 9 years ago
wip-9998 has 2 fixes, but i'm convinced they are the same bug...
Updated by Pawel Sadowski over 9 years ago
I'm changing weights manually by editing CRUSH map.
Updated by Dan van der Ster over 9 years ago
In our case we sometimes get -3.052e-05 as the first weight of a new osd that has been added to the crush map by the init script (using the update crush on start option).
Updated by Pawel Sadowski over 9 years ago
Dan van der Ster wrote:
In our case we sometimes get -3.052e-05 as the first weight of a new osd that has been added to the crush map by the init script (using the update crush on start option).
That's my case too. But in order to trigger this we have to first manually change weight of the host. To avoid this bug check if OSDs weights on host are equal to host weight, host weights are equal to rack weight, and so on to the root of the CRUSH. The easiest fix/workaround is to use 'ceph osd crush reweight <some_weight>' - this will update host weight to sum of all OSD on it. But you should verify weights on each node.
Updated by Sage Weil over 9 years ago
I'm still having trouble reproducing this. :( Maybe you can attach a copy of an osdmap just prior to adding the osd? Thanks!
Updated by Pawel Sadowski over 9 years ago
- File crushmap.e6984.txt crushmap.e6984.txt added
- File crushmap.e7120.txt crushmap.e7120.txt added
- File crushmap.e7121.txt crushmap.e7121.txt added
- File crushmap.e7154.txt crushmap.e7154.txt added
I've just reproduced this on my test cluster. I'm using Ceph v0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578
) with following patch (used for debugging this issue, that's why weight is set to 0.123
instead of -3.052e-05
).
diff --git a/src/crush/CrushWrapper.cc b/src/crush/CrushWrapper.cc index 9618d98..65e08f1 100644 --- a/src/crush/CrushWrapper.cc +++ b/src/crush/CrushWrapper.cc @@ -587,6 +587,9 @@ int CrushWrapper::create_or_move_item(CephContext *cct, int item, float weight, } else { if (item_exists(item)) { weight = get_item_weightf(item); + if (weight < 0) { + weight = 0.123; + } ldout(cct, 10) << "create_or_move_item " << item << " exists with weight " << weight << dendl; remove_item(cct, item, true); }
Updated by Sage Weil over 9 years ago
- Status changed from Need More Info to Fix Under Review
Updated by Loïc Dachary over 9 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Loïc Dachary over 9 years ago
- Status changed from Pending Backport to Fix Under Review
- giant backport https://github.com/ceph/ceph/pull/3178
- firefly backport https://github.com/ceph/ceph/pull/3179
Updated by Sage Weil over 9 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Loïc Dachary about 9 years ago
- Status changed from Pending Backport to Resolved