Project

General

Profile

Quotas vs subtrees » History » Version 1

Jessica Mack, 07/03/2015 09:21 PM

1 1 Jessica Mack
h1. Quotas vs subtrees
2 1 Jessica Mack
3 1 Jessica Mack
h3. Summary
4 1 Jessica Mack
5 1 Jessica Mack
Generalize and adapt the SnapRealm subtree mechanism into a generic subvolume/subtree concept that is (1) explicitly managed/visible to the admin, (2) used by both snapshots and quotas.
6 1 Jessica Mack
7 1 Jessica Mack
h3. Owners
8 1 Jessica Mack
9 1 Jessica Mack
* Yunchuan Wen (yunchuanwen@ubuntukylin.com)
10 1 Jessica Mack
* Sage Weil 
11 1 Jessica Mack
12 1 Jessica Mack
h3. Interested Parties
13 1 Jessica Mack
14 1 Jessica Mack
* Name (Affiliation)
15 1 Jessica Mack
* Name (Affiliation)
16 1 Jessica Mack
* Name
17 1 Jessica Mack
18 1 Jessica Mack
h3. Current Status
19 1 Jessica Mack
20 1 Jessica Mack
The snapshots break the namespace into SnapRealms, which are subtree chunks that share the same snapshot context (i.e., have the same set of snapshots applied).
21 1 Jessica Mack
New SnapRealms are created when
22 1 Jessica Mack
# a snap is created at a new point in the hierarchy.
23 1 Jessica Mack
# a subdir in one snaprealm is renamed into another snaprealm.  the subdir becomes the root of a new snaprealm that nested inside the target, which a past_parent pointer to the former.
24 1 Jessica Mack
25 1 Jessica Mack
When the new realm is created it is a 'split' event.  This is some expensive and involves a message to the client that enumerates all of the ino's with client caps that need to be moved into the child realm.  The client thus has a coherent view of which realm any given inode belongs to at all times.
26 1 Jessica Mack
27 1 Jessica Mack
h3. Detailed Description
28 1 Jessica Mack
29 1 Jessica Mack
There are some challenges with teh snaprealm code, particularly when dealing witht the past_parents relationship.  This is mostly caused when opening up an inode in teh cache: we need the past_parents in order to generate a valid SnapContext for the realm, but that past parent might be in some other part of the hierarchy and take time to resolve.  Until we have it, we cannot issue caps to clients, and we currently aren't smart enough to avoid doing so.  There is also some very complex code that manages propagation of rstat values to past parents after a snapshot has been taken.
30 1 Jessica Mack
The whole situation would be simplified if we did not allow renaming directories between subvolumes/snaprealms.
31 1 Jessica Mack
If we did that, then there would be no past_parents.  the snap issues get much simpler.
32 1 Jessica Mack
We could also make the subvolume management explicit.  e.g.,
33 1 Jessica Mack
 attr -s mydir ceph.subvolume
34 1 Jessica Mack
or whatever, so that the admin decides where teh subvolume boundaries, and thus when -EXDEV will happen on rename.
35 1 Jessica Mack
If there *were* a subvol concept, then quotas would map onto that naturally.
36 1 Jessica Mack
What that buys us:
37 1 Jessica Mack
# clients know what root (inode) every open file belongs to, and thus what rstat value to pay attention to for quota
38 1 Jessica Mack
# same mds/client messages can manage the subvol <-> inode relationship
39 1 Jessica Mack
# when split is implemented in the future, we cna piggyback on the split messages.  on the other hand,
40 1 Jessica Mack
## snaprealms are implicitly created when you rename c from realm a to realm b.  for quotas, we only care whether we are beneath b.. not that we are inside a c nested inside a and b.  
41 1 Jessica Mack
## so maybe we need to distinguish between snaprealm-things that are subvol roots and those that are not
42 1 Jessica Mack
 
43 1 Jessica Mack
*Option 1*
44 1 Jessica Mack
# rename SnapRealm to SubvolRealm
45 1 Jessica Mack
# rename MClientSnap message to MClientSubvol or similar
46 1 Jessica Mack
# separate new realm creation into an explicit subvol creation op, triggered by a vxattr or new mds op
47 1 Jessica Mack
# only allow quotas to be set on subvol roots
48 1 Jessica Mack
# use existing snapbl (renamed subvolbl) to associate all inodes with the subvol root
49 1 Jessica Mack
# [maybe] allow rename between subvols with no snaps
50 1 Jessica Mack
## add a new MOVE op, distinct but similar to split, that simply moves inodes to a different realm.  this will be used when you rename a dir between subvols. 
51 1 Jessica Mack
# [someday] enable rename between subvols with snaps
52 1 Jessica Mack
## add a SubvolRealm property that indicates whether it si a subvol root or not
53 1 Jessica Mack
## make split work to enable snaps vs renames.
54 1 Jessica Mack
## mds: fix things with opening past_parents
55 1 Jessica Mack
 
56 1 Jessica Mack
*Option 2*
57 1 Jessica Mack
# add a new qtree (or subvol) construct
58 1 Jessica Mack
# instantiate in client cache and mds cache
59 1 Jessica Mack
# chain all inodes to the subvol they belong to
60 1 Jessica Mack
# mark subvol in any inodestat reply to client
61 1 Jessica Mack
# add a new MOVE message used on rename