Osd - ceph on zfs » History » Version 1
Jessica Mack, 06/21/2015 03:55 AM
1 | 1 | Jessica Mack | h1. Osd - ceph on zfs |
---|---|---|---|
2 | |||
3 | h3. Summary |
||
4 | |||
5 | Allow ceph-osd to better use of ZFS's capabilities. |
||
6 | |||
7 | h3. Owners |
||
8 | |||
9 | * Sage Weil (Inktank) |
||
10 | |||
11 | h3. Interested Parties |
||
12 | |||
13 | * Sage Weil (Inktank) |
||
14 | * Mark Nelson (Inktank) |
||
15 | * Yan, Zheng (Intel) |
||
16 | * Haomai Wang (UnitedStack) |
||
17 | * Wido den Hollander (42on) |
||
18 | * Eric Eastman (Keeper Technology) |
||
19 | * Daniele Stroppa (ZHAW) |
||
20 | * Sam Zaydel (RackTop Systems) |
||
21 | * Sam Just (Inktank) |
||
22 | |||
23 | h3. Current Status |
||
24 | |||
25 | We have worked to identify and fix the xattr bugs in zfsonlinux such that ceph-osd will run on top of ZFS in the noraml write-ahead journaling mode, just as it will on ext4 or XFS. We do not take advantage of any special ZFS features. |
||
26 | |||
27 | h3. Detailed Description |
||
28 | |||
29 | At a minimum, ZFS's snapshot support could be used the same way it is used on btrfs to provide a stable consistency point to journal relative too, allowing us to use the parallel jounraling mode (which has much better read/modify/write performance). |
||
30 | Looking further forward, I suspect there are much more involved ways that we could take advantage of ZFS, by utilizing the DMU directly instead of using the posix layer. I would like to discuss both the short-term improvements as well as the long-term possibilities in this session. |
||
31 | To abstract the underlying fs functionality out of FileStore, we need an interface that looks like like this: |
||
32 | class BackingFileSystem { |
||
33 | |||
34 | p((. bool can_checkpoint(); ///< true if we can snapshot to allow parallel journaling, etc. |
||
35 | int create_base_volume(); ///< use during mkfs.. mkdir in the degenerate case, create_subvole for btrfs, ... |
||
36 | int list_checkpoints(list<string> *ls); ///< used during mount. list the checkpoints |
||
37 | int rollback_to_checkpoint(string name); ///< used during mount to roll back to the last checkpoint befor ejournal replay |
||
38 | int create_checkpoint_start(string name); ///< start a snap. during sync_entry() |
||
39 | int create_checkpoint_finish(); |
||
40 | int remove_checkpoint(string name); ///< trim an old snap |
||
41 | |||
42 | p((. // other btrfs/fs optimizations |
||
43 | int clone_range(...); ///< fall back to copy as necessary |
||
44 | |||
45 | }; |
||
46 | The FileStore::_detect_fs() will need to be refactored to instantiate an implementation of the above instead of the current open-coded checks. |
||
47 | All references to btrfs_stable_commits will be repalced with can_checkpoint(). |
||
48 | Once this refactoring is in place, implementing a zfs backend should be pretty straightforward. |
||
49 | TODO: |
||
50 | * identify correct zfs snap interface (ioctls?) |
||
51 | * look at nilfs2? |
||
52 | |||
53 | h3. Work items |
||
54 | |||
55 | h4. Coding tasks |
||
56 | |||
57 | # filestore: generalize the snapshot enumeration, creation hooks and other btrfs-specific behaviors such that the btrfs hooks fit into a generic interface |
||
58 | # filestore: implement generic backend (xfs, ext4, etc.) |
||
59 | # filestore: implement btrfs backend |
||
60 | # filestore: clean out all btrfs_* member cruft |
||
61 | # filestore: implement a zfs backend that triggers zfs snapshots |
||
62 | # ceph-deploy: add zfs to the list of file systems supported by osd create ... |
||
63 | |||
64 | h4. Build / release tasks |
||
65 | |||
66 | # include zfsonlinux in ceph-qa-chef on supported platforms |
||
67 | # teuthology: add support for fs: zfs |
||
68 | # include fs:zfs in the rados test matrix |
||
69 | |||
70 | h4. Documentation tasks |
||
71 | |||
72 | # document the filestore backend interface in the internals section of the docs |