Project

General

Profile

Project Ideas

This page was originally created as an idea list for Google Summer of Code 2013. However, we wanted it to be an open resource to anyone looking to get involved in Ceph (or anyone that wants to share a project idea with the Ceph community). Read below for projects you can break off and tackle.

File Locking in Ceph-FUSE

Technical Contact: Greg Farnum

Description:

The Ceph distributed file system can be accessed by either a kernel client or a (user mode) FUSE module. The kernel client (and the meta data servers) support file locking, but the FUSE implementation does not yet include include this support. Extend the current FUSE module to implement lock calls. Most of this can be very similar to the implementation already in the kernel client, but it will be necessary to extend the FUSE module to keep track of the locks it owns so that it can reestablish those locks in case of meta data server failure.

Required Skills and Resources:

The existing FUSE module is a single C++ module. This work will require a basic understanding of file system operations and distributed lock management. Most of the development and testing can be done with VMs on your computer; if anything requires additional resources we can probably provide VPN access to our lab.

Implement ACLs for CephFS with Extended Attributes

Technical Contact: Sage Weil

Description:

While most Linux users are satisfied with owner/group/other protection, other (e.g. CIFS) users need more expressively powerful Access Control Lists. ACLs can easily be implemented in terms of extended attributes, which are supported by many Linux file systems. Design and develop a cephfs ACL implementation based on the generic ACL helper code. The approach is pretty simple since 3.1 (e.g. fs/btrfs/acl.c, but based on a global ACL configuration option). Then get a put together an (also Open Source) ACL test suite to verify the correctness of the implementation.

Required Skills and Resources:

The CephFS kernel client, like all kernel code, is implemented in C. The amount of code to be implemented should not actually be very large. Submitters should have experience with developing/testing Linux kernel code. It should be possible to do most of this work on a Linux desktop, but if additional machines are needed (for extensive testing) we can probably provide VPN access to our test lab.

Slow OSD Detection Agent

Technical Contact: Dan Mick

Description:

The RADOS object store automatically detects and heals from failed Object Storage Daemons. Disks have become smart enough to automatically recover from many errors … so rather than fail, they simply get slower. Because RADOS write replication is synchronous, it runs at the speed of the slowest copy, and slow disks can significantly affect system performance. Even tho these devices (or systems) have not yet failed, it is probably a good idea to remove them from service (and replace them) as quickly as possible. A very general and robust way to do this is to build an external agent that observes the performance of each OSD and looks for slow ones (or servers that seem to contain multiple slow ones) and reports them. Design and develop an agent that uses the RADOS RESTful monitoring APIs to track the work going into each OSD and the rate at which that work is being accomplished to identify OSDs that seem to be bottlenecks.

Required Skills and Resources:

This client can be implemented in almost any language, but the most obvious choices would be Python, Java or C++. Basic development can be done on any Linux desktop. Initial testing will require a small RADOS cluster, which can be run on a few additional machines, or virtual machines on the development system. Testing at larger scale will require access to much larger clusters. If necessary, we can probably provide VPN access to test clusters in our own lab.

Expose BTRFS Checksums

Technical Contact: Sage Weil

Description:

Btrfs is an Open Source state-of-the-art high performance file system. One of its many nice features is checksumming of all data and metadata (as protection against corruption due to undetected read errors). Ceph Object Storage Daemons would like to verify the integrity of all data they exchange. The computation and storing of checksums is expensive, and much better performance could be obtained if we were able to use and compare against the checksums already being computed by Btrfs. Unfortunately, Btrfs does not expose its internal checksums to clients. Design and implement an extension to Btrfs that exposes to clients the internally computed checksums for (subsets of) individual files. Ideally these updates should be acceptable for upstream inclusion into Btrfs.

Required Skills and Resources:

Btrfs, like all kernel code, is implemented in C. The amount of code to be implemented may not be very large (assuming the built-in checksums can be isolated to (parts of) a single file). But understanding the existing code in order to make those simple changes may prove quite complex. Submitters should have experience with B-trees, file systems, and developing/testing Linux kernel code. It should be possible to do most of this work on a Linux desktop, but if additional machines are needed (for extensive testing) we can probably provide VPN access to our test lab.

Extend Existing Distributed Key/Value Store APIs to Multiple Objects

Technical Contact: Sam Just

Description:

Last summer, an intern implemented a very nice set of Key/Value APIs on top of B-Trees stored in RADOS objects. This implementation includes good handling of node splits and merges, but the B-Tree must be stored in a single object. This limits both the size of the supportable Key/Value sets and the available performance (since different objects can be accessed in parallel and are likely to be stored on different nodes). Design and implement an extension of these Key/Value APIs to stripe the B-Tree over multiple objects, develop a set of unit test cases to verify correctness, and run performance tests to verify the scalability and throughput improvements.

Required Skills and Resources:

The existing Key/Value store is implemented as a C++ library, and this work requires familiarity with B-Trees, multi-threaded code, and distributed transactions. Much of the development and testing can be done with a few PCs (or even VMs on a desktop). Performance testing will require access to more machines. If you do not have access to enough local machines, we can probably arrange to provide VPN access to machines in our test lab.

Extend CRUSH Policy Language to Support More Interesting Policies

Technical Contact: Sage Weil

Description:

CRUSH is a deterministic, rule-based consistent hash-like algorithm (with some very nice properties) for determining object placement in distributed storage systems. Its selection rule language, while already very useful, is incapable of expressing some useful rules. For example we cannot implement “two copies in the same rack, and another in a different row” because this applies different rules to different branches of the selection process. Propose a set of extensions to allow different branches to use different rules, modify the CRUSH rule compiler and interpreter to implement the new operations, develop a set of unit test cases to verify the correctness and statistical quality of the resulting placements.

Required Skills and Resources:Edit section

The CRUSH rule compiler is a relatively simple C++ program. The CRUSH interpreter is a relatively small and simple C module. There are already tools to test the randomness of the resulting distribution, but extension will probably also require some understanding of statistics. All development and testing can be easily done on a personal Linux desktop.