Project

General

Profile

Actions

Feature #17668

closed

Feature #14031: EC overwrites

bug with last_backfill = head when snapdir exists and a write removes snapdir

Added by Samuel Just over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

90%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Not strictly speaking a problem for replicated pg, the snapdir object itself might wind
up wrong, but, you were going to backfill it later anyway.

Only a problem for ECBackend because the entry needs to be rollbackable, so the
collection_move_rename can't die with an ENOENT (which also crashes the osd).
How did this not come up before? First, I think the ranges never end on head or
snapdir because both can't exist and therefore if either is the last element
present, end would have to be a different object or the range boundary (so
intervals encompass -- not contain -- either both or neither by accident).
Second, at most one or the other can exist officially. If head exists, we
don't get the trouble scenario because creating SNAPDIRS works ok by accident.
The other scenario can only happen if we fail to get an rwlock on the snapdir
object itself because if it exists, head can't, so if we passed by head, we
must still have a count to try (and snapdir must be there to try). I think
this bug exists in the existing code, but the race is very tight. The overwrites
logic widens the race because it'll be a pipeline flush, and those are costly.

It's possible that there is another element that makes the rwlock conflict
scenario actually impossible with current master, but I haven't found it yet.

Actions #1

Updated by Samuel Just over 7 years ago

  • Tracker changed from Bug to Feature
  • Parent task set to #14031

marking as a feature so it'll show up in my todo list

Actions #2

Updated by Samuel Just over 7 years ago

  • Status changed from New to 7
  • % Done changed from 0 to 90
Actions #3

Updated by Samuel Just over 7 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF