Rbd copy-on-read for clones


Copy-on-read would improve locality for data that's actually used by a clone, without using the full space required by flattening the clone.
This is especially useful when the clone is a high-latency connection away from its parent.


  • Name (Affiliation)

Interested Parties

  • Josh Durgin (Inktank)
  • Danny Al-Gaaf
  • Anip Patel (Arizona State University)
  • Name

Current Status

This would be in addition to the copy-on-write support implemented in librbd/

Detailed Description

Currently clones in rbd only copy data from their parent when they write to an object. Reading a clone will read data from the parent if the relevant object in the clone does not exist yet. If the clone is being used from a location far away from the parent in latency, reads will be very expensive. Caching the parent image could help, but if the clone is expected to stay far away from the parent (for example if the parent is in a different pool in a separate geographical location), it is useful to have a local copy of the parent. Rather than copying the parent all at once, we can opportunistically copy data from the parent to the clone as it is read (or partially overwritten). That is, a read would fetch the entire
range needed for a clone's object, and write it to the clone in the background.

Work items

Coding tasks

  1. librbd: add copy-on-read option (standard ceph option that can be read from config file, env, or cli)
  2. librbd: for clone reads from a parent, read the entire object if copy-on-read (or another? flag is set)
  3. librbd: asynchronously write parent data to the child if copy-on-read flag is set

Documentation tasks

  1. Document new flag(s) - where and why to use them