Project

General

Profile

Actions

Feature #1637

closed

OSDs running full take down other OSDs

Added by pille palle over 12 years ago. Updated over 11 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

this issue has a relation to #1636.
in my test setup of v0.36 when one OSD runs full it gets taken down.
this starts a chain reaction replicating its data to other OSDs which in turn run full, too.
finally the whole cluster is down.

my understanding is, that this behavior is ok for data-safety, but for high availability it's pretty bad.
one dying OSD should not have that impact on others or the whole cluster.
if one OSD is failing and there's not enough capacity to replicate all its data according to the crushmap, a warning would be fine, that it's nessessary to add more capacity. the cluster should stay available.


Related issues 2 (0 open2 closed)

Is duplicate of Ceph - Feature #2011: osd: do not backfill/recover to full osdsResolved

Actions
Has duplicate Ceph - Feature #2911: osd: Restrict recovery when the OSD full list is nonemptyDuplicate08/06/2012

Actions
Actions

Also available in: Atom PDF