Project

General

Profile

PMStore - new OSD backend

Summary

Provide alternative backend for OSD daemon that implements ObjectStore API.
Goals:
  • SSD/NVMe/NVM optimized,
  • In-memory collection/object index, data on block device,
  • Minimize write amplification factor to the block device,
  • Leverage userspace PMBackend library optimized for Ceph’s workload.

Owners

Nigel Cook (Intel)
Lukasz Redynk (Intel)

Interested parties

Current Status

There’s work-in-progress implementation. There’s need to fix corner cases found with synthetic tests form ObjectStore test suite. Availability: TBA, it’ll be soon opensourced.
Detailed Description
PMStore (tightly coupled with PMBackend) is new OSD backend developed and tuned for new generation NVMe drives. Its general architecture is based on MemStore. Instead of keeping memory buffers for objects there’s kept blocks (PMBackend blocks) list for each object. Each object’s block is written alongside with its key, which allows rebuilding in-memory index at startup. PMBackend allows keeping two block-spaces, separate from each other with different block size and occupied area: one with smaller block size is intended to store OMAP/XATTR data, the other one for objects data. PMBackend keeps those two spaces in single, memory-mapped file, which allows translation between block id to virtual memory address.

Object read:
  • Find block id’s associated with given <offset, offset + length> region,
  • For each block get it’s virtual memory address,
  • Create buffer pointers for each block pointer and return all as bufferlist (for now data is copied to the buffer pointer, there’s running research for zero-copy reads).
Object write:
  • Get appropriate block number for storing <offset, offset + length> region,
  • Divide incoming memory buffer into block size chunks,
  • Write each chunk to separate block alongside object’s key and part number.

Work Items

Coding tasks

  • Fix corner case bugs found in synthetic tests.
  • Rewrite XATTR/OMAP handling to allow values bigger than 1k.
  • Optimize key serialization, investigate possible reduction of key size.

Build / release tasks

  • Embed PMBackend as one of git submodules in Ceph sources.
  • Modify appropriate makefiles to statically link library with ceph-osd.
  • Tune default store parameters .

Documentation tasks

  • Finish documentation for the store.