Inconsistency raised while performing multiple "image rename" in parallel.
It is observed that may be due to poor internal locking mechanism the image header can persist in OSDs after renaming the image frequently in parallel, whereas the image doesn't exist in the pool at all. For this behavior we can't assign the new name which is already present in the headers.
Image headers should be deleted properly after renaming the image frequently in parallel so that any kind of consistency related problem can't occur in future.
Step- 1 : Create an image in a pool # rbd create mypool/image --size 1024
[ Note: For testing purpose make ensure that the pool does not exist any other images ]
Step-2 : Execute below scripts in two different shells to perform rename operation frequently in parallel
<Shell-1> # ./rbd_rename_program1.sh mypool
<Shell-2> # ./rbd_rename_program2.sh mypool
[ Note:- Here both scripts are renaming the image by fetching the image name from the pool. Two scripts changes the name in two different manner]
Step-3 : Execute Step-2 again until we get "image already exist" error.
Step-4 : Now find out for which image the CLI is throwing error and then check the existence of the image in pool # rbd ls mypool
- ceph osd map mypool <image name for which error is showing>
- cat /var/lib/ceph/osd/ceph <osd_id>/current/<PG_num>_head
#2 Updated by Jason Dillaman 6 days ago
- Status changed from New to Need More Info
- Priority changed from Normal to Low
While this is in fact a valid issue, I am hesitant to add the complexity of 2 phase commit logic for this edge condition. Is there a valid use-case where you are seeing this hit (besides just trying)?