Feature #1637: OSDs running full take down other OSDs - Ceph - Ceph

Actions

Copy link

Feature #1637

closed

OSDs running full take down other OSDs

Added by pille palle over 12 years ago. Updated over 11 years ago.

Status:

Duplicate

Priority:

High

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Tags:

Backport:

Reviewed:

Affected Versions:

Pull request ID:

Description

this issue has a relation to #1636.
in my test setup of v0.36 when one OSD runs full it gets taken down.
this starts a chain reaction replicating its data to other OSDs which in turn run full, too.
finally the whole cluster is down.

my understanding is, that this behavior is ok for data-safety, but for high availability it's pretty bad.
one dying OSD should not have that impact on others or the whole cluster.
if one OSD is failing and there's not enough capacity to replicate all its data according to the crushmap, a warning would be fine, that it's nessessary to add more capacity. the cluster should stay available.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Feature #1637

OSDs running full take down other OSDs

Updated by Sage Weil about 12 years ago

Updated by Sage Weil about 12 years ago

Updated by Sage Weil about 12 years ago

Updated by Sage Weil over 11 years ago

Updated by Sage Weil over 11 years ago

Updated by Sage Weil over 11 years ago

Updated by Sage Weil over 11 years ago

Updated by Sage Weil over 11 years ago