Feature #3764: osd: async replicas - RADOS - Ceph

Actions

Copy link

Feature #3764

open

osd: async replicas

Added by Samuel Just over 11 years ago. Updated over 4 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Reviewed:

Affected Versions:

Component(RADOS):

OSD

Pull request ID:

Description

The following is more a topic for conversation than a feature:

Currently, latency on any operation is limited by the slowest replica in the pg. It might be worth exploring a scheme where the primary waits for M/N acks or commits before responding to the client where M is a pool configurable and N is the pool size (replication level). Objects which are accessed infrequently but with high latency sensitivity might benefit significantly from such a scheme.

The obvious disadvantage is that to maintain the same guarantees as we currently provide, we would need to contact at least (N-M+1) replicas from each interval in which the pg might have gone active since up to (N-M) replicas might be behind what the client considers completed.

We would also need to allow a configurable bound on how far behind a replica is allowed to be.

It seems to me that rgw bucket indices could benefit from this scheme. Any given bucket index is relatively small, accessed relatively infrequently, and the accesses are relatively cheap, but the index operation latency limits the rgw op latency. The bucket indices could therefore be put in a separate pool with perhaps N=4 and M=3 in order to decrease overall rgw op latency.

A first step might be instrumentation to evaluate how slow the slowest replica tends to be.