Project

General

Profile

Actions

Feature #6261

closed

ceph-filestore-dump use cases for disaster recovery

Added by Alexandre Oliva over 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Context: I often take cluster snapshots and compare file hashes and replication factors of all osds in my cluster, to have known-good states to rollback to in case of disaster. No osds have significant amounts of free space (~80% utilization), so significant changes to crushmap rules require me to drop all snapshots in order for the change to complete.

I'd like to avoid having to drop the ability to go back to a known good state, which may require being able to rebuild osds out of replicas of its pgs in case of disaster. ceph-filestore-dump is close to being able to support this kind of use. It can export a pg and import it into another osd tree, however it requires a full osd tree (current/ and journal), but if it could export from old snapshots too, and import into (copies of) old snapshots, it could be used to move pgs by hand from one last-known-good osd to another. Something like:

ceph-filestore-dump --snap-path=/path/to/osdn/clustersnap_LKG --pgid=0.0 --type=export file=- |
[ssh osdm-server] ceph-filestore-dump --snap-path=/path/to/osdm/clustersnap_LKGcopy --pgid=0.0 --type-import file=-

Note the use of - as filename, to indicate stdout/stdin for piping the data. This is another wishlist item.

Come to think of it, in cases of disaster recovery, it might be nicer to be able to copy the data by hand (say rsync -aAX or cp -R --preserve=all, with --reflink if in the same btrfs), and then export|import just the pg metadata that is not in the pg subdir, i.e.:

cp R --preserve=all --reflink /path/to/osdn/clustersnap_LKG/0.0_head/. /path/to/osdm/clustersnap_LKGcopy/0.0_head &&
ceph-filestore-dump --snap-path=/path/to/osdn/clustersnap_LKG --pgid=0.0 --type=export-meta file=
|
ceph-filestore-dump --snap-path=/path/to/osdm/clustersnap_LKGcopy --pgid=0.0 --type=import-meta file=-

The above assumes any xattrs in the source pg subdir will be appropriate for the destination, or that export|import-meta will adjust them as needed. (import-meta might check that the pg data is there, and even check its consistency, but I'd make these optional, or disable them with --I-known-what-I'm-doing ;-)

I realize these are all “under the hood” operations, but they may be useful for disaster recovery in case say there's only one remaining consistent replica, and if you just let it run to replicate the others, the activity that this implies might cause the last replica to get corrupted.

Actions #1

Updated by Samuel Just over 10 years ago

  • Tracker changed from Bug to Feature
Actions #2

Updated by Sage Weil over 9 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF