Osd - ceph on zfs¶
Summary¶
Allow ceph-osd to better use of ZFS's capabilities.
Owners¶
- Sage Weil (Inktank)
Interested Parties¶
- Sage Weil (Inktank)
- Mark Nelson (Inktank)
- Yan, Zheng (Intel)
- Haomai Wang (UnitedStack)
- Wido den Hollander (42on)
- Eric Eastman (Keeper Technology)
- Daniele Stroppa (ZHAW)
- Sam Zaydel (RackTop Systems)
- Sam Just (Inktank)
Current Status¶
We have worked to identify and fix the xattr bugs in zfsonlinux such that ceph-osd will run on top of ZFS in the noraml write-ahead journaling mode, just as it will on ext4 or XFS. We do not take advantage of any special ZFS features.
Detailed Description¶
At a minimum, ZFS's snapshot support could be used the same way it is used on btrfs to provide a stable consistency point to journal relative too, allowing us to use the parallel jounraling mode (which has much better read/modify/write performance).
Looking further forward, I suspect there are much more involved ways that we could take advantage of ZFS, by utilizing the DMU directly instead of using the posix layer. I would like to discuss both the short-term improvements as well as the long-term possibilities in this session.
To abstract the underlying fs functionality out of FileStore, we need an interface that looks like like this:
class BackingFileSystem {
bool can_checkpoint(); ///< true if we can snapshot to allow parallel journaling, etc.
int create_base_volume(); ///< use during mkfs.. mkdir in the degenerate case, create_subvole for btrfs, ...
int list_checkpoints(list<string> *ls); ///< used during mount. list the checkpoints
int rollback_to_checkpoint(string name); ///< used during mount to roll back to the last checkpoint befor ejournal replay
int create_checkpoint_start(string name); ///< start a snap. during sync_entry()
int create_checkpoint_finish();
int remove_checkpoint(string name); ///< trim an old snap
// other btrfs/fs optimizations
int clone_range(...); ///< fall back to copy as necessary
The FileStore::_detect_fs() will need to be refactored to instantiate an implementation of the above instead of the current open-coded checks.
All references to btrfs_stable_commits will be repalced with can_checkpoint().
Once this refactoring is in place, implementing a zfs backend should be pretty straightforward.
TODO:
- identify correct zfs snap interface (ioctls?)
- look at nilfs2?
Work items¶
Coding tasks¶
- filestore: generalize the snapshot enumeration, creation hooks and other btrfs-specific behaviors such that the btrfs hooks fit into a generic interface
- filestore: implement generic backend (xfs, ext4, etc.)
- filestore: implement btrfs backend
- filestore: clean out all btrfs_* member cruft
- filestore: implement a zfs backend that triggers zfs snapshots
- ceph-deploy: add zfs to the list of file systems supported by osd create ...
Build / release tasks¶
- include zfsonlinux in ceph-qa-chef on supported platforms
- teuthology: add support for fs: zfs
- include fs:zfs in the rados test matrix
Documentation tasks¶
- document the filestore backend interface in the internals section of the docs
Updated by Jessica Mack almost 9 years ago · 1 revisions