rbd-mirror replay fails on attempting to reclaim data to local site (LS) from distant-end after DE promotion.
in the event that the primary pool (on the local-site) becomes disconnected and empty without being recreated (say, `rbd rm` all images in the pool) on reconnection of the mirror, under normal operation, the distant-end will notice the now empty pool and then clear itself.
this can be prevented by promoting the images on the distant-end before the local-site pool is brought back on-line. the desire now is to recover the data from the distant-end back to the LS. when the LS comes back on-line and the pool mirroring is established, the LS pool will begin the process of replaying all images marked primary on the distant-end. however, any image that originally belonged to the primary pool in the first place will not complete the mirror process due to a "split-brain" situation; an empty image "placeholder" will be created in the pool but no data will be transferred. this is caused by something in the local-site pool (it was emptied, not re-created) still thinking that those images belong to it and are (in my estimation) still treated as primary.
it is possible to trigger the mirror playback in one of three ways:
1) by issuing `rbd mirror image resync` on the desired images in the LS pool,
2) restarting the `rbd-mirror` on the local-site,
3) if the LS pool is re-created and then re-linked via `rbd-mirror` all images marked primary in the distant-end will be replayed as normal (full copy of all data) on the initial mirror connection.
the issue, then, is that the primary pool does not completely forget about the images when they are deleted. though the above work-arounds are capable of returning the data back to the original primary pool:
- why does the local-site pool still think it owns the data?
- is it the journal in the pool or some other piece of meta-data or image configuration that lies hidden?
- could Ceph / RBD be extended to completely remove this left-over data so that the "emptied" scenario behaves similarly to the "re-created" scenario?"
1) establish a mirror between LS (local-site) and DE (distant-end) with LS being the primary pool.
2) let the mirror do it's initial work replaying LS to DE
3) once the journal replaying is complete stop the rbd-mirror service on DE
4) delete target (or all) images in the LS pool
5) promote target (or all) images in the DE to primary
6) reconnect the two pools by starting rbd-mirror on the LS.
- that the LS will fully copy all of the images marked primary in the DE.
a) starting rbd-mirror on LS will complain that it cannot open files it thinks that it knows about.
b) "rbd mirror pool status --verbose" for LS will show images reappear as if they were either supposed to be there originally or attempted to be copied from the DE.
c) these images will be empty (all zeroes)
1) restart the rbd-mirror daemon on LS. this will trigger a resync.
2) manually trigger a resync using "rbd mirror image resync"
3) completely re-creating the LS pool / ceph cluster.
#3 Updated by Eric M Gonzalez about 2 months ago
- File ls.rbd-mirror.issue-19811 added
Sure -- same results.
I've attached a log of operations that you can follow along and see what's happening. I hope that it shows what's going on better than my (too) verbose description above. Let me know if there's anything else I can get for you.