Project

General

Profile

Erasure coded storage backend (step 3) » History » Version 1

Jessica Mack, 06/22/2015 03:07 AM

1 1 Jessica Mack
h1. Erasure coded storage backend (step 3)
2
3
h3. Summary
4
5
As of "Dumpling":http://tracker.ceph.com/versions/160 Ceph keeps copies of objects to not loose them when hardware fails. It is an efficient and flexible solution for Ceph block storage but when performances are less important than cost, "erasure codes":https://en.wikipedia.org/wiki/Erasure_code achieves significant savings. For instance, instead of buying 3PB to store 1PB, it is enough to buy 1.5PB and have better resilience. Erasure coding an object works by splitting it in K data chunks ( let say 10 ) and computing M parity chunks ( let say 3 ). By storing each of the 13 chunks on different disks, it is possible to rebuild the original object even if 3 of them fail. It is better than the resilience provided by three copies and requires only half the space, but it comes at a price : it is less flexible, requires more work during recovery and is more complex to design.
6
The use cases of "Genomic data":http://www.annaisystems.com/,  "backups":http://dachary.org/?p=2087 or others requiring cheap hardware, extremely resilient storage, written once or twice and rarely read are perfect matches for an erasure coded storage backend. Ceph was originaly designed to include "RAID4":https://github.com/ceph/ceph/blob/71531edd8a645a249f5edc984e5d8be85317abf7/src/osd/OSD.cc#L1356 as an alternative to replication and the work suspended during years was resumed after the [[Erasure encoding as a storage backend|first Ceph summit in May 2013]].
7
Shortly after "the Emperor Ceph summit in August 2013":http://tracker.ceph.com/projects/ceph/wiki/Wiki a team of four people started creating the Erasure Code libraries and refactor the Ceph internals to make room for it and their combined work is included in the Emperor release.
8
9
h3. Owners
10
11
* Loic Dachary
12
13
h3. Team
14
15
* Loic Dachary "full time developer":http://tracker.ceph.com/activity?from=2013-11-30&user_id=789
16
* Samuel Just "refactoring":http://tracker.ceph.com/activity?from=2013-11-30&user_id=57
17
* David Zafman "refactoring":http://tracker.ceph.com/activity?from=2013-11-30&user_id=716
18
* Andreas-Joachim Peters "basic pyramid code":http://tracker.ceph.com/issues/6478 and benchmarks.
19
20
h3. Current Status
21
22
* Scheduled for "Firefly ( Feb 2014 )":http://www.inktank.com/about-inktank/roadmap/
23
24
In August 2013 four people started to participate in the implementation of Erasure Code in Ceph.
25
Samuel Just and David Zafman undertook the refactoring of Ceph to abstract the logic that is specific to replication and make room for erasure code. The "PGBackend.h":https://github.com/ceph/ceph/blob/master/src/osd/PGBackend.h class has been added and is now used by the ReplicatedPG implementation. Once it is mature, the Erasure Code placement group will also use it. All Ceph objects can now have a "chunk id":http://tracker.ceph.com/issues/5862, currently unused because the replication does not need it. The acting set no longer "contains the backfill peer":http://tracker.ceph.com/issues/5855. Loïc Dachary created a "plugin mechanism and abstract API":http://tracker.ceph.com/issues/5878 and implemented a "jerasure plugin":http://tracker.ceph.com/issues/5879. The erasure code plugin will read configuration parameters "from a list of key/value":http://tracker.ceph.com/issues/6113 pairs provided to the *ceph pool create* command. (i.e. the . The work is done on two fronts : refactoring and erasure code libraries. Andreas-Joachim Peters improved the jerasure plugin with "basic pyramid code":http://tracker.ceph.com/issues/6478 and added benchmarks of the erasure code library.
26
The "Erasure Code":http://ceph.com/docs/master/dev/osd_internals/erasure_coding/ developer documents have been integrated in master and updated with the "progress of the implementation":http://tracker.ceph.com/issues/4929
27
The "status of Erasure Code":http://www.slideshare.net/Inktank_Ceph/erasure-codeceph was presented during the "Ceph day london":https://cephdaylondon.eventbrite.com/ early October.
28
29
h3. Alpha Tester Program
30
31
A few people signed up to participate to the erasure code alpha tester program, which basically means they are willing to spend some time help debug problems early in the process. The upside is that it gives every participant an opportunity to advocate for their own use case while the development is still new and influence its direction. The process is as follows:
32
* Install the "Ceph integration test tool":https://github.com/ceph/teuthology following the "instructions":https://github.com/ceph/teuthology/blob/master/README.rst
33
* Run the "Erasure Code":http://tracker.ceph.com/projects/ceph/wiki/Wiki suite and reports back by adding creating a "ticket":http://tracker.ceph.com/projects/ceph/issues/new
34
Although Erasure Code is not yet fully integrated in Ceph, the library is in place and we are focusing on benchmarking it. It is very useful, even at this early stage, to collect results from various hardware platforms. As the implementation matures these tests will become more complex and the runs from the Alpha Tester group will discover regressions and bugs early on.
35
36
h3. Coding tasks
37
38
The following list of tasks is either an issue item in the "Ceph tracker":http://tracker.ceph.com/ or a a link to the "erasure code developer documentation":http://ceph.com/docs/master/dev/osd_internals/erasure_coding/ in which links to the tracker can be found for individual tasks. All tasks can be displayed from the "Erasure encoded placement group":http://tracker.ceph.com/issues/4929 feature request.
39
40
h3. Work items
41
42
h4. Loic Dachary
43
44
# Write erasure code integration tests https://github.com/ceph/ceph-qa-suite
45
## Add a wip-erasure-code branch to http://github.com/ceph/ceph/
46
## Schedule the branch to be run by teuthology http://ceph.com/gitbuilder.cgi
47
## Watch the tests results on a daily basis and fix the problems
48
# "Erasure code internal documentation":http://ceph.com/docs/master/dev/osd_internals/erasure_coding/
49
# Run the Alpha Test program
50
51
h4. Samuel Just
52
53
# "Refactor recovery to use PGBackend methods":http://tracker.ceph.com/issues/5857
54
# "Backfill should be able to handle multiple backfill peers":http://tracker.ceph.com/issues/5858
55
 
56
h4. David Zafman
57
58
h4. Andreas-Joachim Peters
59
60
h3. Agenda
61
62
* Status update (5 mn)
63
* Ceph refactor (15 mn)
64
* Erasure Code library (5mn)
65
* Alpha Tester program (5mn)