Bug #22660

Inconsistency raised while performing multiple "image rename" in parallel.

Added by Debashis Mondal 4 months ago. Updated 3 months ago.

Need More Info
Target version:
Start date:
Due date:
% Done:


3 - minor
Affected Versions:


Observation :-

It is observed that may be due to poor internal locking mechanism the image header can persist in OSDs after renaming the image frequently in parallel, whereas the image doesn't exist in the pool at all. For this behavior we can't assign the new name which is already present in the headers. 
Image headers should be deleted properly after renaming the image frequently in parallel so that any kind of consistency related problem can't occur in future.

Execution Steps:-

Step- 1 : Create an image in a pool
                # rbd create mypool/image --size 1024
[ Note: For testing purpose make ensure that the pool does not exist any other images ]
Step-2 : Execute below scripts in two different shells to perform rename operation  frequently in parallel 
<Shell-1> # ./ mypool
<Shell-2> # ./ mypool
[ Note:- Here both scripts are renaming the image by fetching the image name from the pool. Two scripts changes the name in two different manner]
Step-3 : Execute Step-2 again  until we get "image already exist" error.
Step-4 : Now find out for which image  the CLI is throwing error and then check the existence of the image in  pool
          # rbd ls mypool
  1. ceph osd map mypool <image name for which error is showing>
  1. cat /var/lib/ceph/osd/ceph <osd_id>/current/<PG_num>_head View - Rename script1 (300 Bytes) Debashis Mondal, 01/11/2018 08:25 AM View - Rename script2 (274 Bytes) Debashis Mondal, 01/11/2018 08:25 AM

rbd_rename_consistencyIssue_console_log.txt View (10 KB) Debashis Mondal, 01/11/2018 08:25 AM


#1 Updated by Greg Farnum 3 months ago

  • Project changed from Ceph to rbd
  • Category deleted (common)

#2 Updated by Jason Dillaman 3 months ago

  • Status changed from New to Need More Info
  • Priority changed from Normal to Low

While this is in fact a valid issue, I am hesitant to add the complexity of 2 phase commit logic for this edge condition. Is there a valid use-case where you are seeing this hit (besides just trying)?

#3 Updated by Debashis Mondal 3 months ago

When we are trying to simultaneously access the image for renaming in parallel from client side we observed this unexpected behavior of Ceph. If any corruption occurs in header then it can produce a larger problem in production when the image will be accessed by different clients.

#4 Updated by Jason Dillaman 3 months ago

But why do you have multiple clients attempting to rename the same image concurrently?

#5 Updated by Debashis Mondal 3 months ago

Actually we have checked this scenario based on the provided features of ceph.
As ceph provides parallel access to the same image from different client hence there should be proper locking and cleaning mechanism to handle the parallelism and consistency.

When I checked this behavior as per development point of view then we got this kind of improper handling that causes inconsistency in ceph server which should not be accepted.

I simulate this behavior to find the coding related bugs. As per my understanding this bug should be fixed to maintain the consistency

#6 Updated by Jason Dillaman 3 months ago

Sure, not disagreeing that this is undesirable -- but since this is an arbitrary use case that doesn't affect data consistency, I have a feeling that it will never be fixed.

#7 Updated by Debashis Mondal 3 months ago

But according to me it must be fixed. As multiple client have access right to the image then the user can rename the image as per their requirement. Now if 100 users hit the same operation at the same time then this kind of situation will occur for the improper lock handling. Now if somehow a client is trying to give the name which is already provided by some other clients and for the above error the header entry is present in ceph server then the requesting user can't give the same name of their choice because there are multiple header entries (same header entry also) are present in ceph directory which is obviously a bug and mismanagement of ceph server as per user perspective.

This kind of situation can occurs at any point of time based on the users. So this should be fixed to overcome this type of possibilities in future and should be reflected on current releases also. Ceph server should be reliable at any point regardless of how user will use the server.

#8 Updated by Debashis Mondal 3 months ago

I also checked this behavior for other operations of ceph also and i found everything is working as per expected on there. But only for rename, "header" related problem arise. So I think this problem should not be there for rename

#9 Updated by Jason Dillaman 3 months ago

As this is an open source project, you are more than welcome to post a proposed fix for the issue yourself.

Also available in: Atom PDF