Project

General

Profile

Actions

Bug #17667

closed

Feature #14031: EC overwrites

Feature #14040: ECBackend support for RMW

bug with last_backfill = head when snapdir exists and a write removes snapdir

Added by Samuel Just over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Not strictly speaking a problem for replicated pg, the snapdir object itself might wind
up wrong, but, you were going to backfill it later anyway.

Only a problem for ECBackend because the entry needs to be rollbackable, so the
collection_move_rename can't die with an ENOENT (which also crashes the osd).
How did this not come up before? First, I think the ranges never end on head or
snapdir because both can't exist and therefore if either is the last element
present, end would have to be a different object or the range boundary (so
intervals encompass -- not contain -- either both or neither by accident).
Second, at most one or the other can exist officially. If head exists, we
don't get the trouble scenario because creating SNAPDIRS works ok by accident.
The other scenario can only happen if we fail to get an rwlock on the snapdir
object itself because if it exists, head can't, so if we passed by head, we
must still have a count to try (and snapdir must be there to try). I think
this bug exists in the existing code, but the race is very tight. The overwrites
logic widens the race because it'll be a pipeline flush, and those are costly.

It's possible that there is another element that makes the rwlock conflict
scenario actually impossible with current master, but I haven't found it yet.

Actions #1

Updated by Samuel Just over 7 years ago

  • Status changed from New to Closed

opened a dup by accident

Actions

Also available in: Atom PDF