Technical details on the model

Some background about replication and erasure coding

"Erasure codes are a superset of replicated and RAID systems."
The main advantage of erasure codes is that they leverage the statistical stability of large number of components.
Assuming we have 1 million machines, and 10% are down we can calculate the probability of availability of a block that has 2 replicas using the formula below and we will get two-nines availability:

Instead, we can use erasure coding with the same storage overhead ratio. For instance, we can use k=32, m=32 (in total we will have 64 blocks). The new formula is:

This time with obtain over 8-nines availability. Nice, isn't it?
Source: Weatherspoon, H., Kubiatowicz, J., "Erasure Coding vs. Replication: A quantitative comparison"


Both replications methods share assumptions:
  • Parallel I/O recovery operations.
  • Objects are written in the primary OSD of the PG identified by the CRUSH map. The primary daemon contacts other OSDs for replication and recovery purposes.
  • Failure rate is constant and follows a Poisson distribution.

State model (v 0.2)

Screen_Shot_2014-06-27_at_00.15.48.png View (13.3 KB) Jessica Mack, 06/01/2015 08:20 PM

Screen_Shot_2014-06-27_at_00.54.29.png View (13.6 KB) Jessica Mack, 06/01/2015 08:20 PM

simple_state_model_v0.2.jpg View (88.2 KB) Jessica Mack, 06/01/2015 08:20 PM