Hadoop over Ceph RGW status update¶
The goal is to write a Hadoop Compatible Filesystem(RGWFS) to allow Hadoop run over RGW. We also want to add load balancing feature to allow this scale to some rack rachitecture
If you are interested in contributing to this blueprint, or want to be a "speaker" during the Summit session, list your name here.
In Infernalis we proposed this BP. During the last several months, we've got some progress.
Thanks to SwiftFS, RGWFS is able to reuse lots of code. Currently the general code path is done. We're able to read/write with Hadoop command line tool through RGWFS, which talks to the backend Rados cluster.
We have implented a simple WSGI server that can give out the nearest RGW instance by looking through the internal data mapping in the Rados cluster. By giving the object name, RGW-Proxy would query in the cluster to check the mapping of data(ceph osd map obj_name), and then give out corresponding RGW instance
There're a few things we're working on.
- Make RGWFS work with multiple RGW instance
- Performance testing
This section should contain a list of work tasks created by this blueprint. Please include engineering tasks as well as related build/release and documentation work. If this blueprint requires cleanup of deprecated features, please list those tasks as well.
Build / release tasks