Feature #39339: prioritize backfill of metadata pools, automatically - RADOS - Ceph

Actions

Copy link

Feature #39339

open

prioritize backfill of metadata pools, automatically

Added by Ben England about 5 years ago. Updated about 3 years ago.

Status:

In Progress

Priority:

High

Assignee:

Sage Weil

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Reviewed:

Affected Versions:

Component(RADOS):

Pull request ID:

Description

Neha Ojna suggested filing this feature request.

One relatively easy way to minimize damage in a double-failure scenario (loss of 2 devices or nodes at different times) is to prioritize repair of metadata pools. Examples of this are Cephfs and RGW. It is crucial that in most cases the user should not have to specify this prioritization, and Ceph should be able to automatically come up with a reasonable prioritization that works in almost all of the use cases, since a sysadmin may be unavailable when the problem occurs.

Motivation: Ceph is now being considered for small-footprint configurations where there are only 3 nodes where backfilling is impossible with traditional replica-3 pools, and for economic reasons there is pressure to consider replica-2 pools (e.g. with NVM SSD). In such pools, it is critical that backfilling minimize the possible damage if a second failure occurs. But even with 3-way replication, if a PG loses 2 of its OSDs, then it becomes unwritable and hence unavailable (not lost), so we still want to minimize probability of metadata unavailability.

Cephfs has 1 metadata pool per filesystem, and this pool is orders of magnitude smaller than the data pool(s). So in a backfill situation, it's really important that the metadata pool be repaired before the data pool. If the reverse was to happen and the metadata pool was lost, the data pool would effectively also be lost (i.e. the directory structure and file attributes would be gone).

RGW has many pools but most of them are tiny, and typically there is one large data pool (at least in my limited experience). Certainly the bucket index pool is orders of magnitude smaller than the data pool, but is vital for navigating to the data.

One other possible optimization that would have a similar effect - prioritize pools in reverse order of size. This does not require any classification of pools as metadata.
If size is difficult to determine, PG count might be a proxy for size - typically the largest pools have higher PG counts.

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Feature #39339

prioritize backfill of metadata pools, automatically

Updated by Sage Weil about 5 years ago

Updated by Ben England about 5 years ago

Updated by Ben England about 5 years ago

Updated by David Zafman about 5 years ago

Updated by David Zafman about 5 years ago

Updated by David Zafman about 5 years ago

Updated by David Zafman about 5 years ago

Updated by David Zafman about 5 years ago

Updated by Neha Ojha almost 5 years ago

Updated by Neha Ojha almost 5 years ago

Updated by Neha Ojha almost 5 years ago

Updated by Neha Ojha almost 5 years ago

Updated by Nathan Cutler almost 5 years ago

Updated by David Zafman about 3 years ago