Project

General

Profile

Hadoop over Ceph RGW status update

Summary
The goal is to write a Hadoop Compatible Filesystem(RGWFS) to allow Hadoop run over RGW. We also want to add load balancing feature to allow this scale to some rack rachitecture

Owners
Yuan Zhou(Intel)
Jian Zhang(Intel)

Interested Parties
If you are interested in contributing to this blueprint, or want to be a "speaker" during the Summit session, list your name here.
Name (Affiliation)
Name (Affiliation)
Name

Current Status
In Infernalis we proposed this BP. During the last several months, we've got some progress.
  • RGWFS
    Thanks to SwiftFS, RGWFS is able to reuse lots of code. Currently the general code path is done. We're able to read/write with Hadoop command line tool through RGWFS, which talks to the backend Rados cluster.
  • RGW-Proxy
    We have implented a simple WSGI server that can give out the nearest RGW instance by looking through the internal data mapping in the Rados cluster. By giving the object name, RGW-Proxy would query in the cluster to check the mapping of data(ceph osd map obj_name), and then give out corresponding RGW instance
Detailed Description
There're a few things we're working on.
  • Make RGWFS work with multiple RGW instance
  • Performance testing

Work items
This section should contain a list of work tasks created by this blueprint. Please include engineering tasks as well as related build/release and documentation work. If this blueprint requires cleanup of deprecated features, please list those tasks as well.

Coding tasks
Task 1
Task 2
Task 3

Build / release tasks
Task 1
Task 2
Task 3

Documentation tasks
Task 1
Task 2
Task 3

Deprecation tasks
Task 1
Task 2
Task 3