Extend crush rule language


Extend the CRUSH mapping language to support more complicated placement strategies.


  • Sage Weil (Inktank)
  • ???

Interested Parties

  • Josh Durgin
  • Dan Mick
  • Xiaoxi Chen
  • Zhiteng Huang
  • Loic Dachary
  • Christophe Courtaut
  • Xiaobing Zhou(xzhou40 (AT)

Current Status

The current mapping language is very simple: TAKE, CHOOSE[LEAF], EMIT. It works for the most common placement strategies only.

Detailed Description

A few limitations:
- The current choose iterates over the 'working set' and recursively selects new items for each item. It always applies to all items in the working set. That precludes strategies like "pick 2 racks, choose N from the first, and M from the second".
- It is assumed the hierarchy is a single uniform tree. You cannot have two parallel trees of devices (say, SSDs and HDDs) in the same nodes, and pick 1 ssd and 1 hdd but ensure that they exist in different hosts.

Use cases currently not covered by the existing implementation:
- choose 2 racks. choose N replicas from first rack, M replicas from second rack.
- each host contains two types of devices. choose 2 hosts, and choose type A from first host and type B from second host.
- give the number of replicas priority over the available buckets, e.g.: if the replication size is N and the rule says put them in different rooms (or racks, ...), but there are only M<N rooms (or racks, ...), create none the less N replicas and distribute them using all of the M rooms (let some rooms (or racks, ...) have more than one replica)
- Use addressing (or other?) rules (e.g. IPv4 subnet) to automatically group replicas and choose N replicas from first group, M replicas from second group, etc. May be close to the first use case listed above.
- ??? (please add to this list!)

Work items