Project

General

Profile

Actions

Feature #17158

open

Feature #14031: EC overwrites

EC Overwrites: work out implications of recovery below min_size pushing the can_rollback_to line forward

Added by Samuel Just over 7 years ago. Updated over 7 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

So, recovery below min_size works by not recording a last_epoch_started value and thus not committing us to the last_update value we ended up with. However, to do recovery, the on-disk representation of the object needs to match the log version. This isn't true until we rollforward, at which point we have committed to the version. Ugh, I think the replica when it gets a read request needs to pull extents and attrs from any object with un-applied log entries from the write-aside objects.

Actions #1

Updated by Samuel Just over 7 years ago

  • Tracker changed from Bug to Feature
Actions #2

Updated by Samuel Just over 7 years ago

This is going to be super annoying, at least it can be handled entirely in the sub_read handler.

Actions #3

Updated by Samuel Just over 7 years ago

Gah, we also need to be able to make sure to fix-up scan results.

Actions #4

Updated by Samuel Just over 7 years ago

Even worse: when we do a backfill recovery, the object might have commits-but-unapplied writes. Suppose we change our interpretation of the local can_rollback_to line to a pg-wide (if lazily updated) line? The upshot would be that there would be an authoritative valid (can_rollback_to) (applied_up_to range?) from the max ever issued by a primary to the min recorded by any acting set osd in the most recent active interval. It would be maintained through peering similarly to last_update. The main difference is that when we backfill or recover an object, we maintain also recover the rollback/forward objects back to the newest applied_up_to (the one on the primary). (Question: is there an ordering problem between pushing the applied_up_to line and recovering an object based on that line? What if the line moves while we are recovering the object?)

In a larger sense, on any OSD, applied_up_to and last_update from any OSD from the most recent acting interval would represent conservative lower and upper bounds respectively on the latest write any client has seen a commit for on this PG. As we gain more information during peering, we reduce the last_update bound and increase the applied_up_to bound. Keep in mind, the applied_up_to bound isn't meant to be tight, after all, we might choose to batch dozens of updates before applying the lot of them. Once we actually go active, we have committed to a particular last_update value and will be serving reads on that basis (keep in mind that we block reads on an object while there are uncommitted writes, so we won't normally serve reads until the log entry is persisted across the acting set anyway). So, in active-below-min-size-for-backfill case, all we have to do is not move the applied_up_to line and let recovery populate the other osds with the rollback information.

Actions #5

Updated by Samuel Just over 7 years ago

On the other hand, if we leave the can_rollback_to semantics the same and just recover the "current" version by fetching extents as needed from the overlay objects, that would leave us in the almost the same situation as the current ECBackend -- not necessarily a bad place.

Actions #6

Updated by Samuel Just over 7 years ago

Ok, we'll just recover to the "current" last_update and not let the newly backfilled/recovered peer rollback. On read, we'll scan the log from the current rollback_to riter to head and build up an extent->version map. If acting.size >= min_size, assert that we didn't read anything not mapped to HEAD. Same deal with attr read on primary.

Actions #7

Updated by Samuel Just over 7 years ago

This will also be a problem with scrub.

Actions

Also available in: Atom PDF