Librados - support parallel reads


Submit read operations to multiple replicas in parallel.


  • Name (Affiliation)
  • Name (Affiliation)
  • Name

Interested Parties

  • Sage Weil (Inktank)
  • Guang Yang (Yahoo!)
  • Name

Current Status

Normally we read from the primary replica. We can also read from the "closest" replica or from a random replica.

Detailed Description

Add a new op flag that would send reads to all replicas. The quickest reply would "win" and others would be ignored. This is useful in cases where there is sufficient IO on the backend and we want to minimize observed latency, particularly the long tail due to unfortunately timing with other expensive operations.

Work items

Coding tasks

  1. objecter: submit read message to all replicas
  2. objecter: adjust reply path to accept a reply from any (current replica) source
  3. librados: expose internal objecter flag via librados
  4. [optional] objecter: add a 'cancel' operation that can be submitted so that a slow OSD knows that older requests can be ignored
  5. [optoinal] osd: add support for [best-effort] op cancellation
  6. [optional] objecter: use op cancellation when crush mapping changes (in non-failure case)

Documentation tasks

  1. ensure that librados flag is properly documented