Project

General

Profile

Actions

Feature #62083

open

CephFS multi-client guranteed-consistent snapshots

Added by Shyamsundar Ranganathan 10 months ago. Updated 6 months ago.

Status:
In Progress
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
Labels (FS):
snapshots
Pull request ID:

Description

This tracker is to discuss and implement guranteed-consistent snapshots of subdirectories, when using CephFS across multiple clients.

As it currently stands [1], CephFS snapshots are asynchronous which can hence result in a snapshot not capturing blocks of IO written post a snapshot is created. This would hence not present a guranteed-consistent snapshot of the data, and further when multiple clients are using the subtree being snapshotted may cause the asynchronous nature to get further exacerbated.

Snapshots themselves are generated in a couple of (or more) contexts, one where the data is consistent as per the application/workload using the same and the other where data is "crash" consistent. In the former case when applications are quiesced prior to taking the snapshot, where the CephFS asynchronous behavior is not a concern but an fsfreeze or similar option would be desired. The latter case where snapshots are not co-ordinated with the application, i.e crash consistent snapshots, requires that the asynchronous nature of current CephFS snapshots be more tightly controlled to provide for multi-client guranteed-consistent snapshots.

Additional requirements or feature desires would be to control the guranteed-consistent snapshots in phases, for example
- barrier/freeze a sub-dir across all clients
- create a snapshot
- unbarrier/unfreeze the sub-dir across all clients
The above can help with coordinating snapshots across other systems (notably RBD images) that provide such a broken down set of APIs/functionality, to co-ordinate snapshots across disparate systems.

Also such barrier/freeze schemes, if adapted, should have timeouts in case the snapshot or all clients are not fully co-ordinated, and would result in a failed snapshot attempt that would have to be retried later.

Application consistent and crash consistent snapshots summary

Application consistent snapshots
- These are snapshots that are taken at a lower frequency of (say) every 60m
- Typical operation workflow would be:

  • applications are stopped, or asked to quiesce
  • client side caches are flushed (fsync'd or unmounted)
  • snapshot is taken
  • application is restarted or un-quiesced

- Used for:

  • Recovering from local application data corruption (say ransomware attacks, or user errors causing data loss)
  • Is a form of quick local backups to recover from and also serve as a point at which data can be backed up

- Advantages:

  • Data is consistent as per the application and hence fully reusable for recovery purposes

- Pitfalls:

  • Causes application stalls, till all client side caches are flushed and data is application consistent on disk
  • Applications may need to be stopped if they present no manner of quiescing the application, or an fsfreeze like operation needs to be initiated at the clients which will force stall the applications IO
    • Only using the latter (fsfreeze) is strictly not application consistent, but is better than a crash consistent snapshot from a storage backend perspective

Snapshots for disaster recovery mirroring
- These are snapshots that are taken at a higher frequency of (say) every 5m
- Require that the snapshot be storage crash consistent for recovery of the application
- Typical workflow does not involve any application IO quiescing or any client side cache flushes

  • Workflow does involve point in time barrier generation at the client or storage end, that defines the point in time when the IO stream was snapshotted

- Used for:

  • Recovering from site loss using a crash consistent copy of the snapshot data at a remote site
  • Technically for well behaved applications can also be leveraged for other use-cases as stated above in application consistent snapshots

- Advantages:

  • Typically faster than application or client quiesced options, hence causes smaller application stalls as compared application quiesced snapshots. Hence is preferred for disaster recovery as frequency of snapshots are higher

- Pitfalls:

  • Data is only crash consistent from a storage perspective, and applications are expected to be well behaved with their IO requests in order to recover from the same

Broad solution alternatives
- Co-ordinate via the MDS

  • Revoke/Notify caps from clients before taking the snapshot

- Co-ordinate via the clients

  • Send barrier requests to clients to prevent future IOs to be issued to RADOS (and even to the MDS)

- Coordinate with RADOS

  • I am unaware of RADOS snapshot internals and/or any CephFS object hierarchy w.r.t RADOS objects for the sub-tree to suggest or discuss any alternatives

[1] CephFS Snapshots: https://docs.ceph.com/en/latest/dev/cephfs-snapshots/

Actions #1

Updated by Patrick Donnelly 6 months ago

  • Status changed from New to In Progress
  • Assignee set to Patrick Donnelly
Actions

Also available in: Atom PDF