Feature #2394: Provide tool to answer: "when is it safe to kill this osd" - Ceph - Ceph

Actions

Copy link

Feature #2394

closed

Provide tool to answer: "when is it safe to kill this osd"

Added by Anonymous almost 12 years ago. Updated about 5 years ago.

Status:

Resolved

Priority:

Low

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Reviewed:

Affected Versions:

Pull request ID:

Description

After "ceph osd out 123", when is it safe to kill the ceph-osd daemon?

Assume a busy cluster where there's other failures happening all the time, so "100% active+clean" pgs will never be reached. This question needs to be answered in terms of osd.123, not global cluster health.

"ceph pg dump" probably contains this information, in which case it just needs to be made more accessible; documenting what the output means would be useful.

Background: http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/6217

Unhelpful but existing docs:

http://ceph.com/docs/master/control/#pg-subsystem
http://ceph.com/docs/master/man/8/ceph/#examples