Project

General

Profile

Rgw - Hadoop FileSystem Interface for a RADOS Gateway Caching Tier

Summary

We plan to build a reference solution on Hadoop over multiple Ceph RGW with SSD cache, similar to Openstack Sahara project(Hadoop over Swift). In this solution all the storage servers are in a isolated network with the Hadoop cluster. The RGW instances will play as the connectors of these two networks. We'll leverage Ceph Cache Tier technology to cache the data in each RGW servers.

Owners

Interested Parties

  • Name (Affiliation)
  • Name (Affiliation)
  • Name

Current Status

Currently RGW supports Swift API nicely. However in Sahara project, Swift was specially configured to work with Hadoop: swift-proxy server is working like NameNode to give out the data location and Hadoop job will direct read/write to swift-object server.
Detailed Description
We'll need some additional work here:
1. RGWFS: Hadoop compatible file system which can talk to RGW instances. Basically this will follow SwiftFS(https://github.com/openstack/sahara-extra/tree/master/hadoop-swiftfs) does.
2. RGW-Proxy: A standalone module that would point out the block location in RGW. To achieve data locality(Each MR could read from the RGW instance on the same rack), we'll need to understand the internals of mapping in RGW object to RADOS object, and also the mapping from Cache Tier to Base Tier.
a) RGW-Proxy would first get the manifest file from the header object and then get the rest shadow objects' location in RADOS.
b) RGW-Proxy could calculate the re-mapped location in CT using the right crush rule.
c) With the location in CT, RGW-Proxy then report out the RGW instances to use for each blocks
d) RGWFS will issue range read requests to get the blocks through the closest RGW instances(on the same rack)
3. RGW over Cache Tier: a RGW deployment over Cache Tier that can use SSD as a cache layer.
TODO: Is it able to make header object size(512KB) configurable?

Work items

Coding tasks

  1. Task 1
  2. Task 2
  3. Task 3

Build / release tasks

  1. Task 1
  2. Task 2
  3. Task 3

Documentation tasks

  1. Task 1
  2. Task 2
  3. Task 3

Deprecation tasks

  1. Task 1
  2. Task 2
  3. Task 3