Bug #63560: retry_raced_bucket_write considerations - rgw - Ceph

Actions

Copy link

Bug #63560

closed

retry_raced_bucket_write considerations

Added by Giuseppe Baccini 6 months ago. Updated 5 months ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Updating bucket's metadata concurrently by two or more threads is allowed in radosgw.
There is a retry mechanism: retry_raced_bucket_write(), that expects the bucket references to fetch the latest bucket's state from the persistent store and check that it is equals to the cached bucket's state.
In the case the cached and the stored states are not equal the updating operation wrapped by the retry_raced_bucket_write() is retried.
the wrapped operation returns -ECANCELED to inform retry_raced_bucket_write() to retry.

the two methods used by this mechanism are:

- merge_and_store_attrs
- put_info

merge_and_store_attrs() can confront the cached state of the bucket with the stored one because the modifications are passed with the second parameter: rgw::sal::Attrs& attrs.
So the method can make a confrontation of the current state in memory before applying the delta (attrs) with the stored state.

put_info() on the contrary, can't, because it hasn't a RGWBucketInfo param with an associated "delta".
The get_info() method is used to access the current RGWBucketInfo property of the bucket and that is used to make modifications over that property.
In my opinion the put_info() can never say if its image in memory is good to be flushed into the persistent storage because that comparison will always result different.

I'd guess that put_info() should have an interface like merge_and_store_attrs() with a parameter containing the delta (the new RGWBucketInfo).

What do you think about this?

Actions

Copy link

Updated by Daniel Gryniewicz 5 months ago

In principle, you are probably correct. However, all the cases where put_info() uses retry_raced_bucket_write(), the lambda does all the modifications to obj->info before calling put_info(), so it works correctly. In addition, none of the caller have (or have ever had) a separate place to store the bucket_info, and I'm not sure adding such storage will gain us much. So, I think this is a fine way of doing this. Is there a case where this doesn't work?

Actions

Copy link

Updated by Daniel Gryniewicz 5 months ago

If you'd prefer to discuss this in person, feel free to come to one of the Wed. refactoring meetings (on the community calendar) and we can discuss it on the call.

Actions

Copy link

Updated by Giuseppe Baccini 5 months ago

Thanks Daniel to have looked into this; unfortunately I'm currently in the position to not be able to follow this issue.
I cannot say now, if I will have time in the future to follow this.
Up to you, we can leave this opened or we can close.

Actions

Copy link

Updated by Casey Bodley 5 months ago

Status changed from New to Can't reproduce

thanks Giuseppe

Daniel Gryniewicz wrote:

In principle, you are probably correct. However, all the cases where put_info() uses retry_raced_bucket_write(), the lambda does all the modifications to obj->info before calling put_info(), so it works correctly. In addition, none of the caller have (or have ever had) a separate place to store the bucket_info, and I'm not sure adding such storage will gain us much. So, I think this is a fine way of doing this. Is there a case where this doesn't work?

based on Dan's summary, it sounds like retry_raced_bucket_write() is being used correctly and there's nothing to fix

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #63560

retry_raced_bucket_write considerations

Updated by Daniel Gryniewicz 5 months ago

Updated by Daniel Gryniewicz 5 months ago

Updated by Giuseppe Baccini 5 months ago

Updated by Casey Bodley 5 months ago