Project

General

Profile

Cold Storage Pools

Summary

A proposal for cold-storage pools.

Owners

Interested Parties

Current Status

The existing feature of Cache Tiering in Ceph uses the nomenclature of "Cold Storage" for the backing pool of the cache tier. While this is an accurate term in the context of cache tiering, we use the term in this blueprint to refer to data that is typically written once, and read infrequently.

Detailed Description

In this blueprint we propose a pool type or definition that allows for cold storage of objects.
Cold storage requirements are becoming more and more common across storage environments. Uses for this type of storage system would include many types of storage that is written a few times at most, and accessed infrequently: media files, medical records and imagery, satellite or mission data, public records, etc. A cold storage tier in some environments could replace tape or optical as the archive of last resort; but would provide much faster access to the data when required. In other environments you may use cold storage as an active archive, and transition the data to tape only after a length of time, for example.
In these types of storage environments, it is important to consider that the re-balancing of objects across OSDs and Placement Groups is not optimal. Since these types of data are typically written once and then accessed infrequently, the question of power consumption and network utilization becomes important. Minimizing power consumption is very important for large cold storage archives.
Dense cold storage also can typically have a requirement for full utilization of the available raw storage. Current Ceph architecture would mean that a specified amount of storage may not be fully utilized.
One possible method to achieve this blueprint would be to add a new mode to the available pool values in Ceph: http://ceph.com/docs/master/rados/operations/pools/#set-pool-values. This mode could be used to define a pool as one which contains cold storage that should not be actively rebalanced.
Other methods might include creating options to minimize placement group rebalancing.
This proposal might work more effectively with erasure coding pools vs regular Ceph pools with replication policies - the differences between the two types need to be discussed.

Work items

Coding tasks

  1. Task 1
  2. Task 2
  3. Task 3

Build / release tasks

  1. Task 1
  2. Task 2
  3. Task 3

Documentation tasks

  1. Task 1
  2. Task 2
  3. Task 3

Deprecation tasks

  1. Task 1
  2. Task 2
  3. Task 3