Project

General

Profile

Osd - tiering - object redirects » History » Version 1

Jessica Mack, 06/09/2015 07:25 PM

1 1 Jessica Mack
h1. Osd - tiering - object redirects
2
3
h3. Summary
4
5
Create a RADOS redirect primitive and methods for making use of them.  A redirect should function analogously to a symlink, allowing an object to be moved to a different pool but still be accessible transparently by clients looking in the old location.  This would be underlying infrastructure to support tiering.
6
7
h3. Owners
8
9
* Sage Weil (Inktank)
10
11
h3. Interested Parties
12
13
* Loic Dachary <loic@dachary.org>
14
* Sam Just (Inktank)
15
16
h3. Current Status
17
 
18
h3. Detailed Description
19
20
*--- data types ---*
21
terminology
22
23
p((. origin: original object in original location
24
target: alternative location of object
25
 
26
new fields for object_info_t:
27
 
28
p((. @enum redir_state;                ///< [origin, target]
29
object_locator_t redir_oloc;     ///< [origin] locator for target object
30
eversion_t redir_version;        ///< [origin, target] when this redirect was set to this target
31
u8 flags;                        ///< [origin]
32
object_locator_t owner_oloc;     ///< [target] locator for the origin
33
eversion_t owner_user_version;   ///< [target] user_version, not version!@
34
 
35
where the origin states are:
36
 
37
p((. *NONE*
38
 *REDIRECT*    we are pointing to another object
39
 *PROMOTING*    we are copying the target object back to the origin location
40
 *DEMOTING*    we are copying the primary object to the origin location
41
 *CLEANUP*   we have the object, but need to delete the demoted object
42
 *DELETING*   local object is logically non-existent, but we need to clean up target location.
43
 
44
flags are:
45
46
p((. PROMOTE_ON_READ
47
 PROMOTE_ON_WRITE
48
 
49
- we may want to make PROMOTE_ON_WRITE the only behavior for the initial implementation.
50
 
51
- the demoted object has only 2 states:
52
 
53
p((. *NONE*
54
 *TARGET*      we are pointed to by primary
55
 
56
- primary osd will handle object promote, demote operations (copying to/from alternate location)
57
  - use backend cluster interface to avoid deadlock from throttling ( loic : how can it deadlock from throttling ? sage: hmm, might not be a problem, as long as no recovery operations can block on the redirect state. )
58
 
59
- objecter can also do a SET_REDIRECT operation:
60
   - will erase local object and set redirect metadata
61
 
62
- return redirect metadata with GET_REDIRECT ( loic : without GET_REDIRECT it would transparently try again when receiving a EAGAIN, in the same way an http client would on a 302 ? sage: yeah this is like lstat().. we want to find out if we are a redirect origin or target )
63
 
64
*--- osd behavior ---*
65
 
66
on read (no flags):
67
 NONE, DEMOTING, CLEANUP: do the read
68
 REDIRECT: send EAGAIN with redirect metadata to client
69
 PROMOTING: block or forward. ( loic : what does "forward" mean in this context ? I would understand "block then do the read" )
70
 DELETING: enoent
71
 
72
on read (PROMOTE_ON_READ):
73
 NONE, CLEANUP: do the read
74
 DEMOTING: abort the demotion move to CLEANUP and do the the read
75
 REDIRECT: move to PROMOTING, block then do the read
76
 PROMOTING: block then do the read
77
 DELETING: enoent
78
 
79
on write (no flag);
80
 DEMOTING: block
81
 REDIRECT: forward
82
 PROMOTING: block
83
 DELETING: CLEANUP, proceed.
84
 
85
on write (promote on write);
86
 DEMOTING:
87
   move to CLEANUP
88
 REDIRECT:
89
   move to PROMOTING, block
90
 PROMOTING: block
91
 DELETING: CLEANUP, proceed.
92
 
93
on delete:
94
 DEMOTING, REDIRECT, PROMOTING, CLEANUP: move to DELETING and queue target object for deletion (as with CLEANUP)
95
 DELETING: no change.
96
 
97
on any op:
98
 TARGET: verify the redir_version matches, or EAGAIN
99
 
100
- if we are doing the redirect request and the target does not exist or the version does not match what the redirect/primary had, retry
101
 
102
- the CLEANUP and DELETING states mean the osd needs to remove the redirect and then transition to NONE or delete (respectively)
103
 
104
*--- objecter behavior ---*
105
 
106
- send op to normal location
107
- on EAGAIN with redirect metadata,
108
109
p((.  - note redirect version
110
  - if this is a retry and version hasn't changed, return error to caller.
111
  - resend op to alternate location, *including* the primary's eversion_t
112
  - if we get an error (ENOENT on read), retry from the top
113
 
114
*--- pg log events ---*
115
 
116
redir_demote_start -- we are now allowed to start copying to target pool.  move to DEMOTING
117
redir_demote_finish -- target is in place; delete local data and set redirect metadata. move to REDIRECT
118
redir_promote_cleanup -- did copy from target back to origin; still need to clean up old target.  move to CLEANUP
119
redir_cleanup_finish -- old target is cleaned up.  move to NONE
120
redir_delete_start -- can remove target, move to DELETING
121
remove (existing event) -- finished removing target, delete object.
122
 
123
*--- common races ---*
124
 
125
- read vs demote
126
127
p((.  - if we hit primary while DEMOTING, we get the result
128
  - if we get EAGAIN, we read from teh demoted copy
129
 
130
- read vs promote (or read vs demote+prmote)
131
132
p((.  - try primary
133
  - if REDIRECT:
134
    - EAGAIN, try alternate location
135
    - result, or ENOENT and back to primary (and block->success or success)
136
137
- if PROMOTING, block, then success
138
 
139
*--- in-memory osd state ---*
140
 
141
For each PG, we maintain:
142
* set<Demotion*> redir_demoting;   ///< all pending demotions
143
* set<Promotion*> redir_promotion; ///< all pending promotions
144
* set<Cleanup*> redir_cleanup;       ///< all pending cleanups/deletions.
145
146
These structs will have a ref to the ObjectContext and will need to orchestrate the push/pull to do the promotion/demotion.  They will reuse all of the push/pull helpers used by recovery.
147
 
148
*--- snapshots ---*
149
We can start with a simple approach, and add more complex bheavior from there.
150
# Force promote-on-write if a non-empty SnapContext is specified.  This ensures that all the snap metadata lives in the main pool and makes sense.  Similarly, we refuse to demote anything that is snapped.
151
# Allow snaps to be demoted.  For teh primary pool, recovery needs to be adjusted so that the clone_range stuff falls back to a full copy when the snap is a redirect.  In the target pool, recovery needs to behave when we have a subset of the snapset... i.e. just the snapped object.  It may be simplest if it is not a snap at all: foo @12 -> foo_$version @nosnap with key foo.  And writes/cow never happen in the cold pool.
152
 
153
*--- clonerange ---*
154
If a source item for a clonerange is a redirect, block and promote.
155
 
156
h3. Work items
157
158
h3. Coding tasks
159
160
# osd: add object_info_t fields for redirects
161
# add redirect metadata to MOSDOp, MOSDOpReply.  
162
# add a feature bit.
163
# osd, objecter, librados, api tests: SET_REDIRECT, GET_REDIRECT operations
164
# osd: basic redirect logic: reply with EAGAIN on primary, verify or EAGAIN on target.  
165
# osd: EINVAL or similar if client lacks feature.
166
# objecter: handle EAGAIN redirects
167
# osd: pg log entries to indicate state changes (none -> demoting -> redirect -> promoting -> cleanup, deleting, etc.)
168
# osd: per-PG map of pending redirect states (demoting, promoting, cleanup, tombstone)
169
# osd: log replay to update pending redirect states
170
# osd: support deletion.  refactoring to support tombstones.
171
# osd: promote
172
# osd: demote
173
# osd: allow snap
174
175
h3. Build / release tasks
176
177
# add promote/demote to RadosModel
178
 
179
h3. Documentation tasks
180
181
# Task 1
182