Project

General

Profile

Osd - clone from journal on btrfs » History » Version 1

Jessica Mack, 06/07/2015 01:14 AM

1 1 Jessica Mack
h1. Osd - clone from journal on btrfs
2
3
h3. Summary
4
5
The OSD normally does a double-write, once to the journal, and then to the backing file system.  If we are using btrfs, and the journal is a btrfs file, we can avoid the second write by cloning large writes into their final objects.
6
7
h3. Owners
8
9
* Samuel Just (Inktank)
10
11
h3. Interested Parties
12
13
* Sage Weil (Inktank)
14
* Samuel Just (Inktank)
15
* Mark Nelson (Inktank)
16
* Haomai Wang(UnitedStack)
17
* Anip Patel (Arizona State University)
18
19
h3. Current Status
20
21
Currently the journal events are opaque lumps of data.  
22
Journaling is usually done in 'parallel' mode on btrfs, which means the journal and fs writes are queued at the same time.  Clone from journal probably requires that the journal write complete prior to the actual object write.
23
The journal completion logic cannot currently tell where in the journal file the data portion of the event ended up.
24
25
h3. Detailed Description
26
27
Rather than writing a second time to the object file, we will instead perform a clone from the journal file, avoiding a second write.
28
We need to use writeahead journaling when this feature is enabled, either for the entire store, or just for the events/writes that we wish to do clones on.
29
30
h3. Work items
31
32
h3. Coding tasks
33
34
# track offset, length of data portion in the journal event metadata
35
# record final location in journal for the data portion
36
# pass final location to journal completion handler
37
# allow the completion handler to do a clone instead of the normal write if certain conditions are met (write is > some minimum size)
38
# ensure that journal replay still performs the complete write
39
# consider a hybrid parallel/writeahead approach where large writes go to journal and then fs, while small writes are still done in parallel.
40
# modify ceph-deploy or other tools to use journal files when the backing fs is btrfs (instead of a separate partition)
41
42
h3. Build / release tasks
43
44
# do performance tests to confirm this is a significant improvement
45
# expand rados test matrix to include all journaling modes
46
47
h3. Documentation tasks
48
49
# document the option
50
# document the internals in the internals section