Bug #22660: Inconsistency raised while performing multiple "image rename" in parallel. - rbd - Ceph

Actions

Copy link

Bug #22660

closed

Inconsistency raised while performing multiple "image rename" in parallel.

Added by Debashis Mondal over 6 years ago. Updated about 5 years ago.

Status:

Won't Fix

Priority:

Low

Assignee:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v10.2.7

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Observation :-
--------------------

It is observed that may be due to poor internal locking mechanism the image header can persist in OSDs after renaming the image frequently in parallel, whereas the image doesn't exist in the pool at all. For this behavior we can't assign the new name which is already present in the headers. 
  Image headers should be deleted properly after renaming the image frequently in parallel so that any kind of consistency related problem can't occur in future.

Execution Steps:-

Step- 1 : Create an image in a pool
                # rbd create mypool/image --size 1024
      [ Note: For testing purpose make ensure that the pool does not exist any other images ]

Step-2 : Execute below scripts in two different shells to perform rename operation  frequently in parallel 
         &lt;Shell-1&gt;   # ./rbd_rename_program1.sh mypool
         &lt;Shell-2&gt;  # ./rbd_rename_program2.sh mypool
    [ Note:- Here both scripts are renaming the image by fetching the image name from the pool. Two scripts changes the name in two different manner]

Step-3 : Execute Step-2 again  until we get "image already exist" error.

Step-4 : Now find out for which image  the CLI is throwing error and then check the existence of the image in  pool
          # rbd ls mypool

ceph osd map mypool <image name for which error is showing>

cat /var/lib/ceph/osd/ceph <osd_id>/current/<PG_num>_head

Files

Download all files

rbd_rename_program1.sh (300 Bytes) rbd_rename_program1.sh	Rename script1	Debashis Mondal, 01/11/2018 08:25 AM
rbd_rename_program2.sh (274 Bytes) rbd_rename_program2.sh	Rename script2	Debashis Mondal, 01/11/2018 08:25 AM
rbd_rename_consistencyIssue_console_log.txt (10 KB) rbd_rename_consistencyIssue_console_log.txt		Debashis Mondal, 01/11/2018 08:25 AM

Actions

Copy link

Updated by Greg Farnum over 6 years ago

Project changed from Ceph to rbd
Category deleted (~~common~~)

Actions

Copy link

Updated by Jason Dillaman over 6 years ago

Status changed from New to Need More Info
Priority changed from Normal to Low

While this is in fact a valid issue, I am hesitant to add the complexity of 2 phase commit logic for this edge condition. Is there a valid use-case where you are seeing this hit (besides just trying)?

Actions

Copy link

Updated by Debashis Mondal about 6 years ago

When we are trying to simultaneously access the image for renaming in parallel from client side we observed this unexpected behavior of Ceph. If any corruption occurs in header then it can produce a larger problem in production when the image will be accessed by different clients.

Actions

Copy link

Updated by Jason Dillaman about 6 years ago

But why do you have multiple clients attempting to rename the same image concurrently?

Actions

Copy link

Updated by Debashis Mondal about 6 years ago

Actually we have checked this scenario based on the provided features of ceph.
As ceph provides parallel access to the same image from different client hence there should be proper locking and cleaning mechanism to handle the parallelism and consistency.

When I checked this behavior as per development point of view then we got this kind of improper handling that causes inconsistency in ceph server which should not be accepted.

I simulate this behavior to find the coding related bugs. As per my understanding this bug should be fixed to maintain the consistency

Actions

Copy link

Updated by Jason Dillaman about 6 years ago

Sure, not disagreeing that this is undesirable -- but since this is an arbitrary use case that doesn't affect data consistency, I have a feeling that it will never be fixed.

Actions

Copy link

Updated by Debashis Mondal about 6 years ago

But according to me it must be fixed. As multiple client have access right to the image then the user can rename the image as per their requirement. Now if 100 users hit the same operation at the same time then this kind of situation will occur for the improper lock handling. Now if somehow a client is trying to give the name which is already provided by some other clients and for the above error the header entry is present in ceph server then the requesting user can't give the same name of their choice because there are multiple header entries (same header entry also) are present in ceph directory which is obviously a bug and mismanagement of ceph server as per user perspective.

This kind of situation can occurs at any point of time based on the users. So this should be fixed to overcome this type of possibilities in future and should be reflected on current releases also. Ceph server should be reliable at any point regardless of how user will use the server.

Actions

Copy link

Updated by Debashis Mondal about 6 years ago

I also checked this behavior for other operations of ceph also and i found everything is working as per expected on there. But only for rename, "header" related problem arise. So I think this problem should not be there for rename

Actions

Copy link

Updated by Jason Dillaman about 6 years ago

As this is an open source project, you are more than welcome to post a proposed fix for the issue yourself.

Actions

Copy link

#10

Updated by Jason Dillaman about 5 years ago

Status changed from Need More Info to Won't Fix

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rbd

Custom queries

Bug #22660

Inconsistency raised while performing multiple "image rename" in parallel.

Updated by Greg Farnum over 6 years ago

Updated by Jason Dillaman over 6 years ago

Updated by Debashis Mondal about 6 years ago

Updated by Jason Dillaman about 6 years ago

Updated by Debashis Mondal about 6 years ago

Updated by Jason Dillaman about 6 years ago

Updated by Debashis Mondal about 6 years ago

Updated by Debashis Mondal about 6 years ago

Updated by Jason Dillaman about 6 years ago

Updated by Jason Dillaman about 5 years ago