Rbd - crash-consistent ordered write-back caching extension


RBD cache currently supports object-based DRAM caching only, with no ordered write-back support. Content cacheability is limited by the size of DRAM - proposal here is to extend librbd to support -

  • A new librbd read cache to support LBA-based caching with DRAM/*non-volatile* storage backends
  • An ordered write-back cache that maintains checkpoints internally (or is structured as a data journal), such that writes that get flushed back to the cluster are always crash consistent. Even if one were to lose the client cache entirely, the disk image is still holding a valid file system that looks like it is just a little bit stale [1]. Should have durability characteristics similar to async replication if done right.
  • External caching plug-in interface – kernel and usermode

This enhancement has been discussed at the 2015 Ceph hackathon and also at the Tokyo summit with the Ceph/RBD core.

This proposal is based on discussions in [1], [2] with Sage and Josh. Proof points for client-side caching in [2], [3] and external caching plugin example in Maciej’s CAS POC presentation at the Ceph Performance Weekly Meeting on Nov 25, 2015.

Tunable ordered Write-back caching option should provide boost for streaming, log storage, some VDI workloads and eventually consistent databases. Where write-back is not suitable, a read-only (write-through) option should help improve access latencies - VDI and database workloads for example.


  • Tushar Gohad (Intel)
  • Yuan Zhou (Intel)
  • Anjaneya Chagam (Intel)

Interested Parties

  • Sage Weil, Jason Dillaman, Josh Durgin

Project Phases (current proposal)

  • Phase1:  private, LBA-level, read-only cache - plumb in new generic caching layer in librbd for non-volatile storage-backend – possible ObjectCacher replacement. Read-only cache, designed with scope for write-back caching and pluggability extensions in later phases
  • Phase2: node-local, shared read-only cache. Private ordered/crash-consistent write-back cache
  • Phase3: node-local, shared ordered/crash-consistent write-back cache. External caching plugins – Intel CAS, dm-cache, etc


[2] Ceph Hackathon’15 and Tokyo Summit Discussions on RBD write-back caching
[3] Whitepaper: database workloads using RBD with dm-cache and Ceph cache tiering