Feature #1637: OSDs running full take down other OSDs - Ceph - Ceph

Actions

Feature #1637

closed

OSDs running full take down other OSDs

Added by pille palle over 12 years ago. Updated over 11 years ago.

Status:

Duplicate

Priority:

High

Assignee:

-

Category:

OSD

Target version:

-

% Done:

0%

Source:

Tags:

Backport:

Reviewed:

Affected Versions:

Pull request ID:

Description

this issue has a relation to #1636.
in my test setup of v0.36 when one OSD runs full it gets taken down.
this starts a chain reaction replicating its data to other OSDs which in turn run full, too.
finally the whole cluster is down.

my understanding is, that this behavior is ok for data-safety, but for high availability it's pretty bad.
one dying OSD should not have that impact on others or the whole cluster.
if one OSD is failing and there's not enough capacity to replicate all its data according to the crushmap, a warning would be fine, that it's nessessary to add more capacity. the cluster should stay available.

Related issues 2 (0 open — 2 closed)

Actions

#1

Updated by Sage Weil about 12 years ago

Category set to OSD

Actions

#2

Updated by Sage Weil about 12 years ago

Priority changed from Normal to High

Actions

#3

Updated by Sage Weil about 12 years ago

Tracker changed from Bug to Feature

Actions

#4

Updated by Sage Weil over 11 years ago

Translation missing: en.field_position set to 4

Actions

#5

Updated by Sage Weil over 11 years ago

Translation missing: en.field_position deleted (5)
Translation missing: en.field_position set to 4

Actions

#6

Updated by Sage Weil over 11 years ago

Translation missing: en.field_story_points set to 5
Translation missing: en.field_position deleted (6)
Translation missing: en.field_position set to 4

Actions

#7

Updated by Sage Weil over 11 years ago

Translation missing: en.field_position deleted (32)
Translation missing: en.field_position set to 2

Actions

#8

Updated by Sage Weil over 11 years ago

Status changed from New to Duplicate
Translation missing: en.field_position deleted (3)
Translation missing: en.field_position set to 3

Actions

Also available in: Atom PDF