Project

General

Profile

Osd - ceph on zfs » History » Version 1

Jessica Mack, 06/21/2015 03:55 AM

1 1 Jessica Mack
h1. Osd - ceph on zfs
2
3
h3. Summary
4
5
Allow ceph-osd to better use of ZFS's capabilities.
6
7
h3. Owners
8
9
* Sage Weil (Inktank)
10
11
h3. Interested Parties
12
13
* Sage Weil (Inktank)
14
* Mark Nelson (Inktank)
15
* Yan, Zheng (Intel)
16
* Haomai Wang (UnitedStack)
17
* Wido den Hollander (42on)
18
* Eric Eastman (Keeper Technology)
19
* Daniele Stroppa (ZHAW)
20
* Sam Zaydel (RackTop Systems)
21
* Sam Just (Inktank)
22
23
h3. Current Status
24
25
We have worked to identify and fix the xattr bugs in zfsonlinux such that ceph-osd will run on top of ZFS in the noraml write-ahead journaling mode, just as it will on ext4 or XFS.  We do not take advantage of any special ZFS features.
26
27
h3. Detailed Description
28
29
At a minimum, ZFS's snapshot support could be used the same way it is used on btrfs to provide a stable consistency point to journal relative too, allowing us to use the parallel jounraling mode (which has much better read/modify/write performance).
30
Looking further forward, I suspect there are much more involved ways that we could take advantage of ZFS, by utilizing the DMU directly instead of using the posix layer.  I would like to discuss both the short-term improvements as well as the long-term possibilities in this session.
31
To abstract the underlying fs functionality out of FileStore, we need an interface that looks like like this:
32
class BackingFileSystem {
33
34
p((. bool can_checkpoint();   ///< true if we can snapshot to allow parallel journaling, etc.
35
int create_base_volume();    ///< use during mkfs.. mkdir in the degenerate case, create_subvole for btrfs, ...
36
int list_checkpoints(list<string> *ls);   ///< used during mount.  list the checkpoints
37
int rollback_to_checkpoint(string name);   ///< used during mount to roll back to the last checkpoint befor ejournal replay
38
int create_checkpoint_start(string name);  ///< start a snap.  during sync_entry()
39
int create_checkpoint_finish();
40
int remove_checkpoint(string name);  ///< trim an old snap 
41
 
42
p((. // other btrfs/fs optimizations
43
int clone_range(...);   ///< fall back to copy as necessary
44
 
45
};
46
The FileStore::_detect_fs() will need to be refactored to instantiate an implementation of the above instead of the current open-coded checks.
47
All references to btrfs_stable_commits will be repalced with can_checkpoint().
48
Once this refactoring is in place, implementing a zfs backend should be pretty straightforward.
49
TODO:
50
* identify correct zfs snap interface (ioctls?)
51
* look at nilfs2?
52
53
h3. Work items
54
55
h4. Coding tasks
56
57
# filestore: generalize the snapshot enumeration, creation hooks and other btrfs-specific behaviors such that the btrfs hooks fit into a generic interface
58
# filestore: implement generic backend (xfs, ext4, etc.)
59
# filestore: implement btrfs backend
60
# filestore: clean out all btrfs_* member cruft
61
# filestore: implement a zfs backend that triggers zfs snapshots
62
# ceph-deploy: add zfs to the list of file systems supported by osd create ...
63
64
h4. Build / release tasks
65
66
# include zfsonlinux in ceph-qa-chef on supported platforms
67
# teuthology: add support for fs: zfs
68
# include fs:zfs in the rados test matrix
69
70
h4. Documentation tasks
71
72
# document the filestore backend interface in the internals section of the docs