Project

General

Profile

Actions

Bug #49939

closed

cephfs-mirror: be resilient to recreated snapshot during synchronization

Added by Venky Shankar about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

100%

Source:
Community (user)
Tags:
Backport:
pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The mirror daemon works with snapshots paths. It does rely on snap-id to infer deleted and renamed snapshots, but once a snapshot is picked up for synchronization, accessing snapshots data is path-based.

This can lead to nasty races such as transferring incorrect snapshot data (to the remote file system), when a snapshot (that is just picked for synchronization) is deleted and a new snapshot (with the same name, but possible different contents) gets created.

Patrick suggested to introduce snapshot paths based on snap-ids such as /path/to/dir/.snap/<snap-id>/... and resolve the snap-id to snapshot name (or maybe /path/to/dir/.snap/_<snap-d>_<snap-name>/... making use of the fact that snashot names begining with "_" are disallowed).

Alternatively, we could introduce *at() family of APIs in libcephfs to mitigate this issue.


Subtasks 2 (0 open2 closed)

Bug #50298: libcephfs: support file descriptor based *at() APIsResolvedVenky Shankar

Actions
Bug #50561: cephfs-mirror: incrementally transfer snapshots whenever possibleResolvedVenky Shankar

Actions

Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #50994: pacific: cephfs-mirror: be resilient to recreated snapshot during synchronizationResolvedVenky ShankarActions
Actions #1

Updated by Venky Shankar about 3 years ago

  • Target version set to v17.0.0
Actions #2

Updated by Patrick Donnelly about 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Venky Shankar
Actions #3

Updated by Venky Shankar about 3 years ago

So, I am experimenting with how MDS handles path traversals when just an inode number rather than inode number+dname especially when the inode is deleted by a client when another client has an open fd on it. We already have `fstatx()` implemented in the Client (libcephfs) -- so I forced a getattr on the (deleted inode) and the MDS seems to handle it (i.e. resovle the inode and return inode attrs).

Seems like its pretty much straightforward to implement *at() family of calls. I went ahead and added those that would be required for the mirror daemon to safely walk snapshot contents (fstatxat, openat, readlinkat, etc..). Haven't tested it yet w/ the mirror daemon, but I'll get to it soon.

However, the library changes should be out as a PR really soon.

@Patrick ^^

Actions #4

Updated by Venky Shankar about 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 40831
Actions #5

Updated by Venky Shankar almost 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Backport Bot almost 3 years ago

  • Copied to Backport #50994: pacific: cephfs-mirror: be resilient to recreated snapshot during synchronization added
Actions #7

Updated by Loïc Dachary almost 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF