Project

General

Profile

Sloppy reads

There has been some interest in adding an ability to perform inconsistent reads. Currently, when an osd gets a read on a pg for which it is not the primary, it drops it since the client will resend. When an osd gets a read on a pg for which it is the primary, but which has not yet gone through peering and gone active, the osd delays the op until the pg goes active for consistency reasons. It's possible to use rados in such a way that inconsistent reads are harmless -- for example, if all objects are read only once written and are never recreated. In that case, it might be interesting to allow a read to be serviced on an osd with a non-backfill copy of the pg, even if the osd is not primary or the pg is not active.

To make this work, we need the replicas to keep track of in progress writes (which we need anyway for correct replica reads). We also need to vet the existing code to remove dependencies on state populated during peering.

Would we want to allow inconsistent reads on any replicated pool on an IO-by-IO basis, or do we want to create a new pool type with restricted operations where all reads are inconsistent? The former is more flexible, but I am hard pressed to think of a case where it would be used correctly. The latter would fulfill most requirements, and would allow us to allow only writefull, read, and delete.