Project

General

Profile

Quotas vs subtrees » History » Version 1

Jessica Mack, 07/03/2015 09:21 PM

1 1 Jessica Mack
h1. Quotas vs subtrees
2
3
h3. Summary
4
5
Generalize and adapt the SnapRealm subtree mechanism into a generic subvolume/subtree concept that is (1) explicitly managed/visible to the admin, (2) used by both snapshots and quotas.
6
7
h3. Owners
8
9
* Yunchuan Wen (yunchuanwen@ubuntukylin.com)
10
* Sage Weil 
11
12
h3. Interested Parties
13
14
* Name (Affiliation)
15
* Name (Affiliation)
16
* Name
17
18
h3. Current Status
19
20
The snapshots break the namespace into SnapRealms, which are subtree chunks that share the same snapshot context (i.e., have the same set of snapshots applied).
21
New SnapRealms are created when
22
# a snap is created at a new point in the hierarchy.
23
# a subdir in one snaprealm is renamed into another snaprealm.  the subdir becomes the root of a new snaprealm that nested inside the target, which a past_parent pointer to the former.
24
25
When the new realm is created it is a 'split' event.  This is some expensive and involves a message to the client that enumerates all of the ino's with client caps that need to be moved into the child realm.  The client thus has a coherent view of which realm any given inode belongs to at all times.
26
27
h3. Detailed Description
28
29
There are some challenges with teh snaprealm code, particularly when dealing witht the past_parents relationship.  This is mostly caused when opening up an inode in teh cache: we need the past_parents in order to generate a valid SnapContext for the realm, but that past parent might be in some other part of the hierarchy and take time to resolve.  Until we have it, we cannot issue caps to clients, and we currently aren't smart enough to avoid doing so.  There is also some very complex code that manages propagation of rstat values to past parents after a snapshot has been taken.
30
The whole situation would be simplified if we did not allow renaming directories between subvolumes/snaprealms.
31
If we did that, then there would be no past_parents.  the snap issues get much simpler.
32
We could also make the subvolume management explicit.  e.g.,
33
 attr -s mydir ceph.subvolume
34
or whatever, so that the admin decides where teh subvolume boundaries, and thus when -EXDEV will happen on rename.
35
If there *were* a subvol concept, then quotas would map onto that naturally.
36
What that buys us:
37
# clients know what root (inode) every open file belongs to, and thus what rstat value to pay attention to for quota
38
# same mds/client messages can manage the subvol <-> inode relationship
39
# when split is implemented in the future, we cna piggyback on the split messages.  on the other hand,
40
## snaprealms are implicitly created when you rename c from realm a to realm b.  for quotas, we only care whether we are beneath b.. not that we are inside a c nested inside a and b.  
41
## so maybe we need to distinguish between snaprealm-things that are subvol roots and those that are not
42
 
43
*Option 1*
44
# rename SnapRealm to SubvolRealm
45
# rename MClientSnap message to MClientSubvol or similar
46
# separate new realm creation into an explicit subvol creation op, triggered by a vxattr or new mds op
47
# only allow quotas to be set on subvol roots
48
# use existing snapbl (renamed subvolbl) to associate all inodes with the subvol root
49
# [maybe] allow rename between subvols with no snaps
50
## add a new MOVE op, distinct but similar to split, that simply moves inodes to a different realm.  this will be used when you rename a dir between subvols. 
51
# [someday] enable rename between subvols with snaps
52
## add a SubvolRealm property that indicates whether it si a subvol root or not
53
## make split work to enable snaps vs renames.
54
## mds: fix things with opening past_parents
55
 
56
*Option 2*
57
# add a new qtree (or subvol) construct
58
# instantiate in client cache and mds cache
59
# chain all inodes to the subvol they belong to
60
# mark subvol in any inodestat reply to client
61
# add a new MOVE message used on rename