Project

General

Profile

Actions

Bug #9998

closed

Replaced OSD weight below 0

Added by Pawel Sadowski over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
firefly,giant
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I've hit a bug when replacing OSDs. Under specific conditions replaced OSD gets weight of -3.052e-05.

How to reproduce

  • build a cluster with few nodes
  • remove OSD node with id lower than maximum (i.e. not the last one)
  • change host weight while not changing OSD weights (i.e. sum(weight(osd)) != weight(host))
  • add new osd (it should receive missing id and host weight will be updated)

Following part of the code looks to be responsible for this bug (src/crush/CrushWrapper.cc):

    if (item_exists(item)) {
      weight = get_item_weightf(item);

The value is -3.052e-05 is returned by get_item_weightf when get_item_weight returns -ENOENT (-2):

  float get_item_weightf(int id) {
    return (float)get_item_weight(id) / (float)0x10000;
  }

The worst thing is that such OSD receives most (almost all) of the data!

This looks like somewhere in the code int is casted to unsigned int and value just below zero become a huge weight in crushmap.

OSD tree:

host406249# ceph osd tree
-2    4.4            host host406249
3    1.29                osd.3    up    1    
4    1.29                osd.4    up    1    
6    1.82                osd.6    up    1    
1    -3.052e-05                osd.1    up    1    
-5    6.22        rack C2
-4    6.22            host host406251
0    1.82                osd.0    up    1    
2    1.29                osd.2    up    1    
5    1.29                osd.5    up    1    
7    1.82                osd.7    up    1    
-7    6.22        rack C3
-6    6.22            host host406254
9    1.82                osd.9    up    1    
10    1.29                osd.10    up    1    
11    1.29                osd.11    up    1    
8    1.82                osd.8    up    1    

Host with "broken" OSD:

host406249# df -h /osd-sd{a..d}
Filesystem           Size  Used Avail Use% Mounted on
/dev/mapper/osd-sda  1.3T   88M  1.3T   1% /osd-sda
/dev/mapper/osd-sdb  1.3T   55M  1.3T   1% /osd-sdb
/dev/mapper/osd-sdc  1.9T  100M  1.9T   1% /osd-sdc
/dev/mapper/osd-sdd  1.9T   10G  1.9T   1% /osd-sdd             <------ this the osd.1 with negative weight

Normal host:

host406254# df -h /osd-sd{a..d}
Filesystem           Size  Used Avail Use% Mounted on
/dev/mapper/osd-sda  1.3T  2.4G  1.3T   1% /osd-sda
/dev/mapper/osd-sdb  1.3T  2.0G  1.3T   1% /osd-sdb
/dev/mapper/osd-sdc  1.9T  2.9G  1.9T   1% /osd-sdc
/dev/mapper/osd-sdd  1.9T  3.1G  1.9T   1% /osd-sdd


Files

crushmap.e6984.txt (2.82 KB) crushmap.e6984.txt Before removing OSD.1 Pawel Sadowski, 11/22/2014 10:13 AM
crushmap.e7120.txt (2.79 KB) crushmap.e7120.txt OSD.1 removed Pawel Sadowski, 11/22/2014 10:13 AM
crushmap.e7121.txt (2.79 KB) crushmap.e7121.txt host406247 weight changed by (getcrushmap/setcrushmap) - this is osdmap just prior re-adding OSD Pawel Sadowski, 11/22/2014 10:13 AM
crushmap.e7154.txt (2.82 KB) crushmap.e7154.txt After adding OSD.1 again (new OSD with same id) Pawel Sadowski, 11/22/2014 10:13 AM
Actions #1

Updated by Pawel Sadowski over 9 years ago

This bug might be related to this part of code (use of one variable in two nested loops):

bool CrushWrapper::_search_item_exists(int item) const
{
  for (int i = 0; i < crush->max_buckets; i++) {
    if (!crush->buckets[i])
      continue;
    crush_bucket *b = crush->buckets[i];
    for (unsigned i=0; i<b->size; ++i) {
      if (b->items[i] == item)
    return true;
    }
  }
  return false;
}
Actions #2

Updated by Dan van der Ster over 9 years ago

We see this often in our dumpling cluster. Kinda annoying.

Actions #3

Updated by Samuel Just over 9 years ago

  • Priority changed from Normal to Urgent
Actions #4

Updated by Sage Weil over 9 years ago

  • Status changed from New to Need More Info
  • Assignee set to Sage Weil
  • Source changed from other to Community (user)

Can you clarify how you are doing this?

"change host weight while not changing OSD weights (i.e. sum(weight(osd)) != weight(host))"

Are you manually modifying the CRUSH map or using the CLI?

Actions #5

Updated by Sage Weil over 9 years ago

wip-9998 has 2 fixes, but i'm convinced they are the same bug...

Actions #6

Updated by Pawel Sadowski over 9 years ago

I'm changing weights manually by editing CRUSH map.

Actions #7

Updated by Dan van der Ster over 9 years ago

In our case we sometimes get -3.052e-05 as the first weight of a new osd that has been added to the crush map by the init script (using the update crush on start option).

Actions #8

Updated by Pawel Sadowski over 9 years ago

Dan van der Ster wrote:

In our case we sometimes get -3.052e-05 as the first weight of a new osd that has been added to the crush map by the init script (using the update crush on start option).

That's my case too. But in order to trigger this we have to first manually change weight of the host. To avoid this bug check if OSDs weights on host are equal to host weight, host weights are equal to rack weight, and so on to the root of the CRUSH. The easiest fix/workaround is to use 'ceph osd crush reweight <some_weight>' - this will update host weight to sum of all OSD on it. But you should verify weights on each node.

Actions #9

Updated by Sage Weil over 9 years ago

I'm still having trouble reproducing this. :( Maybe you can attach a copy of an osdmap just prior to adding the osd? Thanks!

Updated by Pawel Sadowski over 9 years ago

I've just reproduced this on my test cluster. I'm using Ceph v0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) with following patch (used for debugging this issue, that's why weight is set to 0.123 instead of -3.052e-05).

diff --git a/src/crush/CrushWrapper.cc b/src/crush/CrushWrapper.cc
index 9618d98..65e08f1 100644
--- a/src/crush/CrushWrapper.cc
+++ b/src/crush/CrushWrapper.cc
@@ -587,6 +587,9 @@ int CrushWrapper::create_or_move_item(CephContext *cct, int item, float weight,
   } else {
     if (item_exists(item)) {
       weight = get_item_weightf(item);
+      if (weight < 0) {
+        weight = 0.123;
+      }
       ldout(cct, 10) << "create_or_move_item " << item << " exists with weight " << weight << dendl;
       remove_item(cct, item, true);
     }
Actions #11

Updated by Sage Weil over 9 years ago

  • Status changed from Need More Info to Fix Under Review
Actions #12

Updated by Loïc Dachary over 9 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #13

Updated by Loïc Dachary over 9 years ago

  • Status changed from Pending Backport to Fix Under Review
Actions #14

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #15

Updated by Loïc Dachary over 9 years ago

  • Backport set to firefly,giant
Actions #16

Updated by Loïc Dachary about 9 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF