Project

General

Profile

LMDB keyvalue backend for Ceph

Summary

We have completed performance evaluation of current key/value store with rocksdb and leveldb, but the performance is worse than filestore, especially for 4K random cases, the major reason is due to impaction caused by the LSM-way implementation. LMDB is an ultra-fast, ultra-compact key-value embedded data store developed by Symas for the OpenLDAP Project. It uses memory-mapped files, so it has the read performance of a pure in-memory database while still offering the persistence of standard disk-based databases, and is only limited to the size of the virtual address space

refer to #11028 if you want to check progress

Owners

  • Xinxin Shu (Intel)
  • Jian Zhang (Intel)
  • Name

Interested Parties

  • Name (Affiliation)
  • Name (Affiliation)
  • Name

Current Status

KeyValueStore performance is Lower than Filestore performance in Seq.Wr, Rand.Wr and Rand.Rd
  • Rocksdb compared with Filestore achieves 17% (Seq.Wr), 63%(Rand.Wr) and 54% (Rand.Rd)
  • Leveldb compared with Filestore achieves 23% (Seq.Wr), 22%(Rand.Wr) and 26% (Rand.Rd)

KeyValueStore seq.wr performance is throttled by its backend write compaction mechanism, where the SSD utilization is near 90%.

Detailed Description

Current filestore is not capable of fully utilize full SSD setup's capacity, and considering currently there are three backend store: memstore, filestore, key-value store, we are exploring whether the key/value store could help in the full SSD setup's scenario.
LMDB is a read-optimized design and performs reads several times faster than other DB engines, several orders of magnitude faster in many cases. It is not a write-optimized design, please refer to ondisk microbench http://symas.com/mdb/ondisk/. since random get is relative slow for rocksdb & leveldb, it would be worthwhile to add LMDB as a new key/value backend store for Ceph.

Work items

Coding tasks

  1. Task 1
  2. Task 2
  3. Task 3

Build / release tasks

  1. Task 1
  2. Task 2
  3. Task 3

Documentation tasks

  1. Task 1
  2. Task 2
  3. Task 3

Deprecation tasks

  1. Task 1
  2. Task 2
  3. Task 3