Project

General

Profile

Bug #49939

cephfs-mirror: be resilient to recreated snapshot during synchronization

Added by Venky Shankar 29 days ago. Updated 8 days ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The mirror daemon works with snapshots paths. It does rely on snap-id to infer deleted and renamed snapshots, but once a snapshot is picked up for synchronization, accessing snapshots data is path-based.

This can lead to nasty races such as transferring incorrect snapshot data (to the remote file system), when a snapshot (that is just picked for synchronization) is deleted and a new snapshot (with the same name, but possible different contents) gets created.

Patrick suggested to introduce snapshot paths based on snap-ids such as /path/to/dir/.snap/<snap-id>/... and resolve the snap-id to snapshot name (or maybe /path/to/dir/.snap/_<snap-d>_<snap-name>/... making use of the fact that snashot names begining with "_" are disallowed).

Alternatively, we could introduce *at() family of APIs in libcephfs to mitigate this issue.


Subtasks

Bug #50298: libcephfs: support file descriptor based *at() APIsFix Under ReviewVenky Shankar

History

#1 Updated by Venky Shankar 29 days ago

  • Target version set to v17.0.0

#2 Updated by Patrick Donnelly 23 days ago

  • Status changed from New to In Progress
  • Assignee set to Venky Shankar

#3 Updated by Venky Shankar 21 days ago

So, I am experimenting with how MDS handles path traversals when just an inode number rather than inode number+dname especially when the inode is deleted by a client when another client has an open fd on it. We already have `fstatx()` implemented in the Client (libcephfs) -- so I forced a getattr on the (deleted inode) and the MDS seems to handle it (i.e. resovle the inode and return inode attrs).

Seems like its pretty much straightforward to implement *at() family of calls. I went ahead and added those that would be required for the mirror daemon to safely walk snapshot contents (fstatxat, openat, readlinkat, etc..). Haven't tested it yet w/ the mirror daemon, but I'll get to it soon.

However, the library changes should be out as a PR really soon.

@Patrick ^^

#4 Updated by Venky Shankar 8 days ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 40831

Also available in: Atom PDF