Project

General

Profile

Actions

Feature #14036

open

Feature #14031: EC overwrites

EC overwrites: PGBackend needs to be replumbed to support overwrites

Added by Samuel Just over 8 years ago. Updated over 4 years ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

90%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

It's not quite as simple as just using the write PGTransaction call directly. ECBackend will need to write the overwritten stripes into temp objects on each replica. The information about how to actually apply that operation to the object needs to live in the PG log entry, however. Perhaps we pass a pointer to the rollback object and let the backend handle it?

Step one is to replace this description with an explanation of how it will work.

Actions #1

Updated by Samuel Just almost 8 years ago

  • Assignee set to Samuel Just
Actions #2

Updated by Samuel Just almost 8 years ago

The trick on this one will be to subsume the xattr cache and all of the mod_desc logic into the PGBackend::Transaction (sp?) implementation. do_osd_ops will require some creative refactoring to make this work, but it should wind up much cleaner than what we have now. With that done, we can get the transaction to generate the mod_desc value when we generate the log entry at the end of finish_ctx. This'll simplify the OpContext too. I should have done it that way to begin with...

Actions #3

Updated by Samuel Just almost 8 years ago

As always, the issue is the split in responsibility between the backend for implementing the operation and ReplicatedPG for controlling the object metadata. The backend needs to be in control of generating the new log entries, but the object info needs to be updated with the new version and prior_version before being passed to the backend. I think probably the backend implementation should simply control applying the unstable metadata to the ObjectContext.

Actions #4

Updated by Samuel Just almost 8 years ago

There's another wrinkle. PGBackend::append(PGBackend::PGTransaction *) [why, oh why, didn't I name it something else?] makes keeping track of the rules for the constructed log entries as entries are added...complicated. Not a problem for ECTransaction -- it uses an intermediate representation on which we can do a reduction pass on before generating log entries. For the RPGTransaction though, we are scribbling directly into an ObjectStore::Transaction, so whatever we do needs to be maintained as the transaction is generated and fixed up during append. Or, we could update both to use an (the same?) intermediate representation. That would have the benefit of letting us have common code for eliminating moot operations. Is that even useful for Replicated? It's possibly useful for EC in order to reduce the arbitrary operation to an equivalent canonical reduced form (delete? create/clone? writes) in order to simplify rollback or rollforward generation.

Actions #5

Updated by Samuel Just almost 8 years ago

If PGTransaction has an explicit update_snap_index() method, we can fill in the snaps field on the oi as well in the backend (duplicated?).

Actions #6

Updated by Samuel Just almost 8 years ago

target: hobject_t -> {delete: bool, create: (None|Create|Promote {reqests :<whatever>}|Clone {from :hobject_t}|Rename {from :hobject_t}), operations: [(setattrs | rmattrs | write | clone_range | truncate | zero | set_alloc_hint | set_snap_map | omap_* | append)]}

I think the above reduction works as long as we can assume clone/clone_range read from the old version and that rename always renames from a temp object to a non-temp object and that writes to the temp object source always precede the rename and that writes to the destination always succeed the rename (or can be mooted out since a rename could be treated as a delete). In that case, we can reset the target for the sequence to the new object and proceed that way.

{delete: true, create: None, x} implies that x is [].

Actions #7

Updated by Samuel Just almost 8 years ago

The above reduction makes operations between objects completely independent, so append operates object-by-object:

// If the second argument is a delete, that's the result (delete is assumed to be false in the second argument below this case)
append(, {delete=true, ...}@x) = Valid x
// If the first operation is delete/None, the second must have a non-None creation value
append({delete=true, type=None, []}, {type=None, ...}) = Invalid
// If the second operation is None, the result is the first with the ops from the second appended
append({delete=d, type=x, ops=y}, {type=None, ops=z}) = Valid {delete=d, type=x, ops=y+z}
// If the second operation is not None, the result is Invalid (object would have to already exist after the first)
append({delete=
, type=_, ops=_}, {type=_, ops=_}) = Invalid

Actions #8

Updated by Samuel Just almost 8 years ago

  • Status changed from New to In Progress
Actions #9

Updated by Samuel Just almost 8 years ago

Hmm, we don't need an op vector, actually. We could instead do:

ObjectOperation : {
  delete         :bool
  create         :(None | Create | Promote {reqids: ...} | Clone {from :hobject_t} | Rename {from :hobject_t})
  truncate       :Maybe off_t
  omap_clear     :bool
  attrs          :Map string (Maybe bufferlist)
  omap           :Map string (Maybe bufferlist)
  omap_header    :Maybe bufferlist
  set_alloc_hint :Maybe (off_t, off_t)
  snap_map       :Maybe (Vector snapid_t) 
  buffer_update  :IntervalMap off_t (Write {date :bufferlist} | CloneRange {from :hobject_t, source_offset: off_t} | Zero)
}
Actions #10

Updated by Samuel Just almost 8 years ago

A positive side effect is that we can read out the updated xattrs trivially for updating the cached version -- no need to maintain a separate structure.

Actions #11

Updated by Samuel Just almost 8 years ago

It also makes it trivial for the current ECBackend to verify that the update is an append without needing an explicit append operation -- much easier that way.

Actions #12

Updated by Samuel Just over 7 years ago

  • Status changed from In Progress to 7
  • % Done changed from 0 to 70
Actions #13

Updated by Samuel Just over 7 years ago

  • % Done changed from 70 to 90
Actions #14

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 7 to Fix Under Review
Actions

Also available in: Atom PDF