cephfs-mirror: suppport snapshot mirror of subdirectories and/or ancestors of a mirrored directory
mgr/mirroring assigns directory paths to `cephfs-mirror` daemon instances. Right now, only a single mirror daemon is supported. So, all directories are assigned to one mirror instance. Snapshot synchronization in the mirror daemon is multithreaded. To support ancestor/subdir snapshot mirroring, mgr/mirroring can accept subdirectories or ancestors of already mirrored directories. However, the synchronization threads in the mirror daemon should guard picking a conflicting directory. Conflicting directories are either a subdirectory or an ancestor of a directory which is under synchronization -- E.g., if `/a/b/c` is under synchronization, then `/a/b` (upto /) and `/a/b/c/d` (i.e., anything under /a/b/c) are conflicting directories (note that `/a/b/c/` and `/a/b/cc/` are not conflicting dirs).
However, CephFS plans to support multiple mirror daemons for HA and concurrent synchronization. mgr/mirroring would distribute directories amongst mirror daemon instances, thereby possibly mapping subdirectories /a/b and /a/b/c/ to different mirror daemon instances. This is a problem. One way to fix this could be to map an entire subtree to a mirror daemon and have mirror daemons not choose conflicting directories. This "pinning" of an entire subtree to a mirror daemon instance may not be optimal. Another possible way would be to let mgr/mirroring assign directories to mirror instances without worrying about ancestor/subdir relation and have the mirror daemons coordinate amongst themselves when picking directories. This coordination can be based on acquiring locks at each path component of a directory. Something like acquiring a shared lock for each path component (except last) and finally an exclusive lock on the last component would ensure that a mirror daemon does not mirror a directory that conflicts across other mirror instances.
#2 Updated by Milind Changire 3 months ago
Subdirs and ancestor dirs replication cannot be done. In my opinion, these are mutually exclusive items. We should either support subdir replication or ancestor dir replication.
The reason for raising such a caveat is that it may happen that a set of mutually exclusive subdirs may be snapshotted and replicated before an ancestor dir added for replication. It may happen that the ancestor dir snaps may not be in sync with subdir snaps. So, although the ancestor dir may actively be snapshot, but it may be enabled for replication at a later date than the subdirs.
Lets say that versions s1, s2, s3, s4 and s5 of a subdir has been replicated so far. Now, the replication for ancestor dir is turned on. The ancestor dir has completely different snapshot versions of the directory tree, i.e. a1, a2, a3, a4 and a5. If replication of ancestor dir starts replication snap a1, then it will overwrite subdir with its own version of the snapshot files. It may happen that there are older version of the files or old files that have now been removed from latest subdir snap altogether. If we go ahead and snapshot the ancestor dir on destination cluster with the a1 version of the file set, it will create a stale tree state in the subdir. Now if the subdir replication kicks in, it will assume that the subdir tree is in pristine condition at version s5 and attempt to copy only the differing blocks to the destination cluster which now has an older state of tree due to ancestor dir replication just done. It may happen that earlier snapshots of ancestor dir may not have the subdir created at all. In short, this is a mess. We can't have ancestor dir and subdir replication both at the same time.
The moment ancestor dir replication is turned on, all replicated subdirs should be identified and snaps for all these subdirs should be purged from the destination cluster as well as the entire subdir contents purged. Only then the ancestor dir should be replicated.
When the ancestor dir replication is turned on, the user should be prompted with the names of the replicated subdirs and asked the question, "If you want to replicate ancestor dir A then replication for subdirs S1, S2, S3, ... etc needs to be turned off and relevant subdirs state on the destination cluster be removed. Are you okay with this ?"
When replication for a subdir is enabled with an already existing ancestor dir replication, then the user should be prompted with the question, "Do you want to turn off replication for ancestor dir A and only replicate state for subdir S ?"
The implementation should be done accordingly.
#3 Updated by Milind Changire 3 months ago
- Status changed from New to Need More Info
pending on "group consistency snapshot replication" requirements
need further info for the above feature in the future from product management
"group consistency snapshot" is doable
but replication of group consistency snapshot is hairy