Project

General

Profile

NewStore (new osd backend)

Summary

Create a replacement for FileStore that implements the ObjectStore API.
The goal is to be
  • general purpose (HDD or SSD)
  • avoid double-writes for new objects and sequential writes
  • be more efficient about small writes
  • leverage a key/value database for our metadata
  • leverage a POSIX file system to block management of object data

Owners

  • Sage Weil (Red Hat)
  • Name (Affiliation)
  • Name

Interested Parties

  • Guang Yang (Yahoo!)
  • Name (Affiliation)
  • Name

Current Status

There is a work-in-progress implementation available at https://github.com/liewegas/ceph/commits/wip-newstore

Detailed Description

There is a reasonable summary of the current state of things at: http://marc.info/?l=ceph-devel&m=142438985013041&w=2
The goal is to get a working prototype available as soon as possible with some basic infrastructure for atomic updates and so forth. We can iterate on how to be extra clever after that.

Work items

Coding tasks

  1. get wip-temp merged; this simplifies our usage of the collections so that there is 1 per PG and no cross-collection moves or renames. 'temp' becomes a special class of objects within a collection with some special semantics (they disappear on restart). there is also a bunch of other cleanup in this branch.
  2. get NewStore write-ahead-log infrastructure working
  3. mangle DBObjectMap so that we can use it while embedding the head into the onode_t. possiblye rewrite/steal from this as the design shoudl be revisited.
  4. at this point NewStore should be functional and pass tests
  5. make the write path do the fsync and transaction completion asynchrnoously
  6. use AIO + DIO for writes when WONTNEED; use buffered writes otherwise
  7. make a smart fsyncer (many threads to improve concurrency, or leverage parallel aio_fsync if we can)
  8. test to ensure it is power-fail safe.
  9. add open-by-handle support (i.e., avoid path lookup overhead)
  10. consider writing some small writes into the kv store instead of taking the WAL path and applying them to the file