Project

General

Profile

Actions

Bug #4283

closed

ceph weight of host not recalculated after taking osd out

Added by Corin Langosch about 11 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Today I experienced an osd failure and marked that osd (osd.1) out. It was a big osd so had a weight of 2. Another smaller disk on this host (r16439) only has a weight of 0.08. Now after removing osd.1 the weight of the host stayed at 2.08. This quickly caused the second os too grow from 40% usage to 80%, at which point I marked it out too.

dumped osdmap tree epoch 11753
  1. id weight type name up/down reweight
    -1 10.5349 pool default
    -3 10.5349 rack rack1
    -2 1.39999 host r15714
    0 0.699997 osd.0 up 1
    4 0.699997 osd.4 up 1
    -4 2.31999 host r15717
    7 0.319992 osd.7 up 1
    9 2 osd.9 up 1
    -5 1.39999 host r15791
    2 0.699997 osd.2 up 1
    5 0.699997 osd.5 up 1
    -6 0.639999 host r15836
    3 0.319992 osd.3 up 1
    6 0.319992 osd.6 up 1
    -7 2.07999 host r16439
    8 0.0799866 osd.8 up 1
    1 2 osd.1 up 0
    -8 2.215 host r16440
    12 0.0749969 osd.12 up 1
    13 0.139999 osd.13 up 1
    10 2 osd.10 up 1
    -9 0.479965 host r16441
    14 0.0799866 osd.14 up 1
    15 0.319992 osd.15 up 1
    16 0.0799866 osd.16 up 1

I'd really like to suggest to recaculate the weight of the host when the weight of its osd changes. Otherwise osds on the same host can easily get overloaded and causing the whole cluster to hang.

Actions

Also available in: Atom PDF