Erasure coded storage backend (step 3)¶
As of Dumpling Ceph keeps copies of objects to not loose them when hardware fails. It is an efficient and flexible solution for Ceph block storage but when performances are less important than cost, erasure codes achieves significant savings. For instance, instead of buying 3PB to store 1PB, it is enough to buy 1.5PB and have better resilience. Erasure coding an object works by splitting it in K data chunks ( let say 10 ) and computing M parity chunks ( let say 3 ). By storing each of the 13 chunks on different disks, it is possible to rebuild the original object even if 3 of them fail. It is better than the resilience provided by three copies and requires only half the space, but it comes at a price : it is less flexible, requires more work during recovery and is more complex to design.
The use cases of Genomic data, backups or others requiring cheap hardware, extremely resilient storage, written once or twice and rarely read are perfect matches for an erasure coded storage backend. Ceph was originaly designed to include RAID4 as an alternative to replication and the work suspended during years was resumed after the first Ceph summit in May 2013.
Shortly after the Emperor Ceph summit in August 2013 a team of four people started creating the Erasure Code libraries and refactor the Ceph internals to make room for it and their combined work is included in the Emperor release.
- Loic Dachary
- Loic Dachary full time developer
- Samuel Just refactoring
- David Zafman refactoring
- Andreas-Joachim Peters basic pyramid code and benchmarks.
- Scheduled for Firefly
In August 2013 four people started to participate in the implementation of Erasure Code in Ceph.
Samuel Just and David Zafman undertook the refactoring of Ceph to abstract the logic that is specific to replication and make room for erasure code. The PGBackend.h class has been added and is now used by the ReplicatedPG implementation. Once it is mature, the Erasure Code placement group will also use it. All Ceph objects can now have a chunk id, currently unused because the replication does not need it. The acting set no longer contains the backfill peer. Loïc Dachary created a plugin mechanism and abstract API and implemented a jerasure plugin. The erasure code plugin will read configuration parameters from a list of key/value pairs provided to the ceph pool create command. (i.e. the . The work is done on two fronts : refactoring and erasure code libraries. Andreas-Joachim Peters improved the jerasure plugin with basic pyramid code and added benchmarks of the erasure code library.
The Erasure Code developer documents have been integrated in master and updated with the progress of the implementation
The status of Erasure Code was presented during the Ceph day london early October.
Alpha Tester Program¶A few people signed up to participate to the erasure code alpha tester program, which basically means they are willing to spend some time help debug problems early in the process. The upside is that it gives every participant an opportunity to advocate for their own use case while the development is still new and influence its direction. The process is as follows:
- Install the Ceph integration test tool following the instructions
- Run the Erasure Code suite and reports back by adding creating a ticket
Although Erasure Code is not yet fully integrated in Ceph, the library is in place and we are focusing on benchmarking it. It is very useful, even at this early stage, to collect results from various hardware platforms. As the implementation matures these tests will become more complex and the runs from the Alpha Tester group will discover regressions and bugs early on.
The following list of tasks is either an issue item in the Ceph tracker or a a link to the erasure code developer documentation in which links to the tracker can be found for individual tasks. All tasks can be displayed from the Erasure encoded placement group feature request.
- Write erasure code integration tests https://github.com/ceph/ceph-qa-suite
- Erasure code internal documentation
- Run the Alpha Test program
- Refactor recovery to use PGBackend methods
- Backfill should be able to handle multiple backfill peers
- Status update (5 mn)
- Ceph refactor (15 mn)
- Erasure Code library (5mn)
- Alpha Tester program (5mn)