Rados cache pool (part 2)


Balance of work to create a cache pool tier


  • Sage Weil (Inktank)
  • Greg Farnum (Inktank)

Interested Parties

  • Mike Dawson (Cloudapt)
  • Yan, Zheng (Intel)
  • Jiangang, Duan (Intel)
  • Jian, Zhang (Intel)

Current Status

About half to two-thirds of the work has been completed:
  • copy-get and copy-from rados primitives
  • objecter cache redirect logic (first read from cache tier, then from base pool)
  • promote on read
Much of the logic is written but not yet merged:
  • dirty, whiteout metadata
  • flush
  • evict
  • HitSet bloom filter (or explicit enumeration) tracking of ios
Balance of effort:
  • hitset expiration
  • recover hitset when pg is recovered/migrated/whatever.
  • [optional] preserve in-memory hitset across peering intervals
  • stress tests that specifically exercise and validate dirty, whiteout, evict, flush, hitsets
  • policy metadata for when to flush/evict from cache
  • agent process/thread/whatever that evicts from cache when it approaches the high water mark

Detailed Description

hitset expiration
  • osd logic to delete old hitsets (and replicate that deletion) once they are old or reach the max count. or the pool max values are adjusted.
policy metadata for flush/evict from cache
  • add pg_pool_t properties to control when we should
    • flush dirty metadata,
    • evicting old items because the pool is getting full
    • evict any item because it is older than X
cache agent
  • this might be a thread, or a python client, or a separate daemon. discuss.
  • periodically check pool metadata (stats) vs policy
  • start at random point in pool and iterate over objects
    • pull hitset history for current position
    • estimate idle time for each object
    • if they are meet some criteria, flush or evict
    • move to next object; pull new hitset metadata as needed
  • include some mechanism to throttle
  • implement policy
  • build a test that adds a cache, populates it, drains it, and disables the cache
    • add tests to the suite that do this in parallel with a running workload?
stress tests
  • extend rados model to simply exercise flush and evict
  • some sort of test to stress the hitset tracking code
  • stress workload that promote new data and force eviction of old data (i.e. degenerate streaming workload)
  • expand qa suite with cache pool tests
    • explicit stress tests (above)
    • enable/populate/drain/disable cache pool (and loop) in parallel with other workloads

Work items

Coding tasks

  1. hitset expiration
  2. policy metadata
  3. cache agent
  4. stress tests

Documentation tasks

  1. document tiering framework
  2. document cache configuration, usage
    1. include limitations (e.g., PGLS results not cache coherent)