A hook framework for Ceph FS operation¶
This is a proposal of a hook framework which enables the execution of callback functions for (any|all) Ceph FS operations, such as readdir, open, create, setxattr, read, and write. These kind of hook framework will be useful, and accelerate the development of other functions on Ceph, such as the developement of Semantic File System or the antivirus alert.
- Yasuhiro Ohara (UCSC, SSRC)
- Sage Weil (Inktank)
I finished the most part of the implementation, and it is (kind of) working. The issue is that sometimes the primary MDS dies: I haven't worked on improving the stability. The Indri content indexing command from Lemur Project was adapted using this hook framework, and a preliminary evaluation experiment result was taken. A paper draft is written and submitted to a conference (happy to share it with somebody in Inktank).
The hook function execution is done on MDS. handle_op() in mds/server.cc now calls a hook execution function which execute an arbitrary UNIX program using fork() and execve(). The program name (path) is obtained from the extended attribute for the file of a specific name. So hooks to MDS operation (readdir, getattr, setxattr to directories) are easy.
For the OSD operations such as read and write, the OSD signals the occurrence of the operation to MDS, and the MDS executes the hook functions. This is done by newly defined message, called MMDSHookNotice. Using the message, the OSD notifies MDS of operation name (e.g., write), file path name, offset, and length (the data extent). MDS will check the hook configuration in the extended attributes. So on ALL OSD operations there are the MMDSHookNotice messages delivered to the primary MDS.
Only the post-hook to write OSD operation is supported. The hook notification occurs after the replications are committed, in do_op() in the ReplicatedPG.cc. Primary OSD of the PG is, upon receipt of the reply for the replication message, send the notification to the primary MDS. Sometimes (the first one ?) the notification is deferred for a 30 seconds (I don't understand the mechanism well).
The issues still unresolved are, 1) the stability issue where MDS sometimes dies, 2) the check of hook configuration on OSD not to notify MDS of unnecessary FS operation events for the hook purpose, 3) the support for the OSD operations other than post-hook to write, and 4) the support for the multiple MDS.