Bug #52485
Updated by Deepika Upadhyay over 2 years ago
Following discussion in "PR #41696":https://github.com/ceph/ceph/pull/41696 When secondary image has cloned snapshots, primary image removal leaves only one option : to remove secondary clones as secondary image is left by rbd-mirror in REMOVING state. If secondary image was in NORMAL state, user could also choose to restore secondary image and make it primary. Steps to reproduce : <pre> ``` # ceph version 16.2.5 (9b9dd76e12f1907fe5dcc0c1fadadbb784022a42) pacific (stable) pool= image= snapshot= primary= secondary= rbd snap create $pool/$image@$snapshot --cluster=$primary rbd snap protect $pool/$image@$snapshot --cluster=$primary rbd mirror image enable $pool/$image snapshot --cluster=$primary # wait sync rbd clone $pool/$image@$snapshot $pool/$image-$snapshot --cluster=$secondary rbd snap unprotect $pool/$image@$snapshot --cluster=$primary # in secondary log : # rbd::mirror::image_replayer::snapshot::ApplyImageStateRequest: 0x55aeea733b00 handle_unprotect_snapshot: failed to unprotect snapshot shot: (16) Device or resource busy # rbd::mirror::image_replayer::snapshot::Replayer: 0x55aef051d800 handle_apply_image_state: failed to apply remote image state to local image: (16) Device or resource busy rbd snap purge $pool/$image --cluster=$primary rbd rm $pool/$image --cluster=$primary # in secondary log : # rbd::mirror::ImageReplayer: 0x55aeec032500 [6/80cdd255-5be3-4362-ba48-d39842ee0d80] handle_shut_down: remote image no longer exists: scheduling deletion # rbd::mirror::ImageReplayer: 0x55aeec032500 [6/80cdd255-5be3-4362-ba48-d39842ee0d80] handle_shut_down: mirror image no longer exists rbd info $pool/$image-$snapshot --cluster=$secondary # parent: $pool/$image@$snapshot (trash $trash) trash= rbd trash ls -p $pool -l --all --cluster=$secondary # ID NAME SOURCE DELETED_AT STATUS PARENT # $trash image MIRRORING Wed Sep 1 12:54:24 2021 expired at Wed Sep 1 12:54:24 2021 rbd trash rm $pool/$trash --force --cluster=$secondary # rbd: image has snapshots - these must be deleted with 'rbd snap purge' before the image can be removed. rbd trash restore $pool/$trash --cluster=$secondary # rbd: restore error: (16) Device or resource busy # -1 librbd::api::Trash: restore: error restoring image id $trash, which is pending deletion rados -p $pool listomapvals rbd_trash --cluster=$secondary #id_$trash #value (33 bytes) : #00000000 02 01 1b 00 00 00 01 05 00 00 00 69 6d 61 67 65 |...........image| #00000010 e0 5b 2f 61 c3 7b 49 1b e0 5b 2f 61 c3 7b 49 1b |.[/a.{I..[/a.{I.| #00000020 02 |.| </pre> ``` It is possible to restore the image by changing last byte from 02 to 00 but this isn't user-friendly. From a different perspective, if rbd-mirror secondary removal was replaced by user actions, it would end in NORMAL not REMOVING state and restore would be possible : <pre> rbd snap create $pool/$image@$snapshot rbd snap protect $pool/$image@$snapshot rbd clone $pool/$image@$snapshot $pool/$image-$snapshot # rbd-mirror actions simulated by user on secondary cluster : rbd snap unprotect $pool/$image@$snapshot # fails as there is a clone rbd trash mv $pool/$image rbd trash rm $pool/$trash # fails as there are snapshots but image is in NORMAL state and restore is possible </pre>