LMDB keyvalue backend for Ceph » History » Version 1
Jessica Mack, 07/10/2015 05:36 PM
1 | 1 | Jessica Mack | h1. LMDB keyvalue backend for Ceph |
---|---|---|---|
2 | |||
3 | h3. Summary |
||
4 | |||
5 | We have completed performance evaluation of current key/value store with rocksdb and leveldb, but the performance is worse than filestore, especially for 4K random cases, the major reason is due to impaction caused by the LSM-way implementation. LMDB is an ultra-fast, ultra-compact key-value embedded data store developed by Symas for the OpenLDAP Project. It uses memory-mapped files, so it has the read performance of a pure in-memory database while still offering the persistence of standard disk-based databases, and is only limited to the size of the virtual address space |
||
6 | |||
7 | refer to #11028 if you want to check progress |
||
8 | |||
9 | h3. Owners |
||
10 | |||
11 | * Xinxin Shu (Intel) |
||
12 | * Jian Zhang (Intel) |
||
13 | * Name |
||
14 | |||
15 | h3. Interested Parties |
||
16 | |||
17 | * Name (Affiliation) |
||
18 | * Name (Affiliation) |
||
19 | * Name |
||
20 | |||
21 | h3. Current Status |
||
22 | |||
23 | KeyValueStore performance is Lower than Filestore performance in Seq.Wr, Rand.Wr and Rand.Rd |
||
24 | * Rocksdb compared with Filestore achieves 17% (Seq.Wr), 63%(Rand.Wr) and 54% (Rand.Rd) |
||
25 | * Leveldb compared with Filestore achieves 23% (Seq.Wr), 22%(Rand.Wr) and 26% (Rand.Rd) |
||
26 | |||
27 | KeyValueStore seq.wr performance is throttled by its backend write compaction mechanism, where the SSD utilization is near 90%. |
||
28 | |||
29 | h3. Detailed Description |
||
30 | |||
31 | Current filestore is not capable of fully utilize full SSD setup's capacity, and considering currently there are three backend store: memstore, filestore, key-value store, we are exploring whether the key/value store could help in the full SSD setup's scenario. |
||
32 | LMDB is a read-optimized design and performs reads several times faster than other DB engines, several orders of magnitude faster in many cases. It is not a write-optimized design, please refer to ondisk microbench http://symas.com/mdb/ondisk/. since random get is relative slow for rocksdb & leveldb, it would be worthwhile to add LMDB as a new key/value backend store for Ceph. |
||
33 | |||
34 | h3. Work items |
||
35 | |||
36 | h4. Coding tasks |
||
37 | |||
38 | # Task 1 |
||
39 | # Task 2 |
||
40 | # Task 3 |
||
41 | |||
42 | h4. Build / release tasks |
||
43 | |||
44 | # Task 1 |
||
45 | # Task 2 |
||
46 | # Task 3 |
||
47 | |||
48 | h4. Documentation tasks |
||
49 | |||
50 | # Task 1 |
||
51 | # Task 2 |
||
52 | # Task 3 |
||
53 | |||
54 | h4. Deprecation tasks |
||
55 | |||
56 | # Task 1 |
||
57 | # Task 2 |
||
58 | # Task 3 |