Project

General

Profile

Osd - new keyvalue backend » History » Version 1

Jessica Mack, 06/22/2015 02:06 AM

1 1 Jessica Mack
h1. Osd - new keyvalue backend
2 1 Jessica Mack
3 1 Jessica Mack
h3. Summary
4 1 Jessica Mack
5 1 Jessica Mack
Create generic, reusable components to back a ceph-osd with a key/value interface
6 1 Jessica Mack
7 1 Jessica Mack
h3. Owners
8 1 Jessica Mack
9 1 Jessica Mack
* Sage Weil (Inktank)
10 1 Jessica Mack
11 1 Jessica Mack
h3. Interested Parties
12 1 Jessica Mack
13 1 Jessica Mack
* Haomai Wang (UnitedStack)
14 1 Jessica Mack
* Yan, Zheng (Intel)
15 1 Jessica Mack
* Jiangang, Duan (Intel)
16 1 Jessica Mack
* Anip Patel (Arizona state University(student))
17 1 Jessica Mack
* Andrey Korolyov (Flops)
18 1 Jessica Mack
19 1 Jessica Mack
h3. Current Status
20 1 Jessica Mack
21 1 Jessica Mack
A bunch of new key/value interfaces are emerging, including
22 1 Jessica Mack
* seagate kinetic https://github.com/Seagate/Kinetic-Preview
23 1 Jessica Mack
* fusionio NVMKV https://github.com/opennvm/nvmkv
24 1 Jessica Mack
* various proprietary interfaces (mostly from flash vendors)
25 1 Jessica Mack
26 1 Jessica Mack
New storage hardware, including shingled drives and flash, will behave *much* better using these emerging interfaces.
27 1 Jessica Mack
The KeyValueDB interface abstracts individual key/value interfaces.  In includes a transaction primitive.  Currently the only implementation uses leveldb.  Alternatives we should consider include
28 1 Jessica Mack
* RocksDB (https://github.com/facebook/rocksdb)
29 1 Jessica Mack
30 1 Jessica Mack
The DBObjectMap interface builds a simple tree structure on top of KeyValueDB that hides some of the namespace complexity (e.g. omap keys vs xattrs keys) and includes a header/intermediate node that allows clone() to happen efficiently.
31 1 Jessica Mack
Haomai has a prototype ObjectStore implementation that uses leveldb on the backend, but it is not quite functional yet, and does not support operations like clone.
32 1 Jessica Mack
33 1 Jessica Mack
h3. Detailed Description
34 1 Jessica Mack
35 1 Jessica Mack
Build an ObjectStore implementation that builds on DBObjectMap (and KeyValueDB) to store everything.  There will be no direct filesystem interaction.
36 1 Jessica Mack
Journaling:
37 1 Jessica Mack
* Since KeyValueDB has a transaction primitive, do not include a journal at all
38 1 Jessica Mack
* We may want to add this capabability later so that (say) NVRAM can mask the latency of a slow k/v backend, but for now let's ignore it for simplity.
39 1 Jessica Mack
* FileJournal (and/or the abstract Journal) should be reusable.
40 1 Jessica Mack
* JournalingObjectStore probably is not reuable; but that's ok
41 1 Jessica Mack
42 1 Jessica Mack
Note on transactions:
43 1 Jessica Mack
* For now, I suggest we assume we can build on the existing KeyValueDB interface, which includes transactions.
44 1 Jessica Mack
** leveldb has transactions
45 1 Jessica Mack
** KVMKV has batch_put, which is limited to 64 keyes
46 1 Jessica Mack
** kinetic has no transactions
47 1 Jessica Mack
* To get the atomicity we need, there are a range of tricks available:
48 1 Jessica Mack
** Write data to new keys, update a sentinal/root key at the end
49 1 Jessica Mack
** Write data to new keys, batch-rename them into place (if backend allows such a thing)
50 1 Jessica Mack
** Intent logs
51 1 Jessica Mack
** Write-ahead transaction journaling (when necessary)
52 1 Jessica Mack
* I am not sure whether we can efficiently hide a 'transaction layer' beneath the KeyValueDB interface, but for now let's assume we will be able to.
53 1 Jessica Mack
  
54 1 Jessica Mack
h3. Work items
55 1 Jessica Mack
56 1 Jessica Mack
h4. Coding tasks
57 1 Jessica Mack
58 1 Jessica Mack
# refactor OSD awareness of FileStore to make the ObjectStore backend configurable
59 1 Jessica Mack
## use a generic method to get an ObjectStore implementation by type
60 1 Jessica Mack
## push any FileJournal and FileStore references out of osd/*
61 1 Jessica Mack
# DBObjectMap: refactor interface
62 1 Jessica Mack
## expose underlying KeyValueDB transactions to caller, so they can bundle several DBObjectMap ops together and capture an entire ObjectStore::Transaction's worth of work)
63 1 Jessica Mack
## expose the user prefixes in a generic way, instead of hard-coding in the omap, xattr, and various internal namespaces
64 1 Jessica Mack
# stripe file data over keys
65 1 Jessica Mack
## Build a class that will implement a file data interface (read extent, write extent, truncate, zero, etc.) on top of DBObjectMap
66 1 Jessica Mack
## stripe data over keys of size X (e.g., 1MB, which seems to be the limit people are converging around)
67 1 Jessica Mack
## store file size information in a metadata key.  maybe this can be DBObjectMap::Header; maybe not
68 1 Jessica Mack
## contemplate future optimizations that put small objects "inline" in the Header (or equivalent) key
69 1 Jessica Mack
# build a KeyValueDB implementation based on the new Kinetic API
70 1 Jessica Mack
## initially, we can just ignore transactions
71 1 Jessica Mack
# build a KeyValueDB implementation based on the NVMKV API
72 1 Jessica Mack
## opportunistically use batch_put, but otherwise ignore large transaction atomicity
73 1 Jessica Mack
# build a KeyValueDB implementations based on RocksDB
74 1 Jessica Mack
## allow omap location to be configured independently of osd data path?  need to consider commit sequence.  :/