Project

General

Profile

Actions

Feature #1637

closed

OSDs running full take down other OSDs

Added by pille palle over 12 years ago. Updated over 11 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

this issue has a relation to #1636.
in my test setup of v0.36 when one OSD runs full it gets taken down.
this starts a chain reaction replicating its data to other OSDs which in turn run full, too.
finally the whole cluster is down.

my understanding is, that this behavior is ok for data-safety, but for high availability it's pretty bad.
one dying OSD should not have that impact on others or the whole cluster.
if one OSD is failing and there's not enough capacity to replicate all its data according to the crushmap, a warning would be fine, that it's nessessary to add more capacity. the cluster should stay available.


Related issues 2 (0 open2 closed)

Is duplicate of Ceph - Feature #2011: osd: do not backfill/recover to full osdsResolved

Actions
Has duplicate Ceph - Feature #2911: osd: Restrict recovery when the OSD full list is nonemptyDuplicate08/06/2012

Actions
Actions #1

Updated by Sage Weil about 12 years ago

  • Category set to OSD
Actions #2

Updated by Sage Weil about 12 years ago

  • Priority changed from Normal to High
Actions #3

Updated by Sage Weil about 12 years ago

  • Tracker changed from Bug to Feature
Actions #4

Updated by Sage Weil over 11 years ago

  • Translation missing: en.field_position set to 4
Actions #5

Updated by Sage Weil over 11 years ago

  • Translation missing: en.field_position deleted (5)
  • Translation missing: en.field_position set to 4
Actions #6

Updated by Sage Weil over 11 years ago

  • Translation missing: en.field_story_points set to 5
  • Translation missing: en.field_position deleted (6)
  • Translation missing: en.field_position set to 4
Actions #7

Updated by Sage Weil over 11 years ago

  • Translation missing: en.field_position deleted (32)
  • Translation missing: en.field_position set to 2
Actions #8

Updated by Sage Weil over 11 years ago

  • Status changed from New to Duplicate
  • Translation missing: en.field_position deleted (3)
  • Translation missing: en.field_position set to 3
Actions

Also available in: Atom PDF