Project

General

Profile

Actions

Feature #8227

closed

RFE: introduce “back in a bit” osd state

Added by Alexandre Oliva about 10 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Sometimes I want to bring an osd down for a bit, say because it is slowing the cluster down, because I want to run some commands on the disk that will often result in data loss if run while the cluster is running (I'm tracking what appears to be a btrfs bug along these lines), or just because I want to reboot the server that holds it.

If ceph is configured to bring osds to the out state a bit after they fail, the PGs held in such a down osd will start being fully replicated to some other OSD, which is most likely excessive, since the OSD will be back in a bit, and making redundant copies will just slow things down.

Conversely, if ceph is configured to not set osds that are down to out automatically, or after a short time, newly-created or modified objects will have a lower replication count than what the PG is configured to hold, and there is going to be a window of exposure for as long as it takes the temporarily-down OSD to come back up and fully recover. Assuming the temporarily-down OSD doesn't bring the PG size below min-size, that is.

My suggestion is some middle ground: a state for the OSD that causes PGs to be remapped for alternate OSDs to hold replicas of modified objects only, but without starting backfilling of them. OSDs would go to “back in a bit” state right after failing (or after some configurable time), which would cause PGs in it to remap to other OSDs to hold copies of newly-created objects, and then (optionally) move to the state currently known as “out” after a longer period of time.

Actions

Also available in: Atom PDF