hadoop: refactor hadoop shim in terms of java libceph bindings
Refactor the hadoop code in terms of generic Java bindings for libceph, instead of mixing the SWIG crap in with the actual functionality.
#1 Updated by Greg Farnum almost 11 years ago
I'm not sure what you're after here -- you mean you want Java bindings for librados, and then the Hadoop patches should be updated to use those?
It's been a while, but from what I recall I was deliberately pretty careful to place Ceph functionality in the C code as much as possible so that we could update the interface and stuff without updating the Hadoop code (under the hope that it would eventually get merged into their codebase where we'd have more trouble getting at it). I suspect that maintaining the CephFS' ambivalence about how anything in libceph is actually used is something we want to preserve.
#2 Updated by Sage Weil almost 11 years ago
I mean that, at the end of the day, we should probably have:
- a java Ceph interface binding that is identical to the interface exposed by libceph (something that looks more or less like posix, with whatever changes/additions make sense from a generic utility point of view). This should be useful to any Java thing that wants to interact with a ceph file system (not librados).
- a hadoop class that uses the generic java Ceph bindings, and any hadoop-specific weirdness.
#3 Updated by Noah Watkins almost 11 years ago
I see the general benefits of a Java-based libceph interface, but what are your long-term plans for Hadoop over Ceph? This sounds like there is room for a version that is not built on top of the kernel client, but the kernel client version is quite stable, and the tiny shim for locality information is fully integrated with the existing Hadoop native-code library.