Project

General

Profile

Bug #1059

RGW consistency issues

Added by Colin McCabe almost 13 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

RGW is supposed to implement read-after-write consistency, but it often does not.

(03:19:57 PM) Tv: cmccabe: i think it goes like this; most of AWS docs are written before the read-after-write stuff came up; the spec is mostly silent about this
(03:20:22 PM) Tv: cmccabe: aws implementation routed write actions to master copy of data, others saw possibly laggy read-only replicas
(03:21:00 PM) Tv: cmccabe: huge customer demand made them add "read-after-write", that is cache invalidation, to you never see stale reads
(03:21:08 PM) Tv: cmccabe: that transition is complete for only some regions
(03:21:31 PM) Tv: cmccabe: they are definitely moving toward read-after-write, and that's what we should implement (that's already been agreed on)
(03:21:50 PM) Tv: cmccabe: but in this case, it seems our request routing is 100% random, that is, worse than aws ever was
(03:21:53 PM) Tv: cmccabe: or something
(03:22:07 PM) Tv: i don't actually know where if anywhere rgw has caching
(03:22:13 PM) Tv: or whether it touches non-primary replicas ever
(03:22:45 PM) Tv: cmccabe: let me explicit about this, one more time: we've already decided to go for "read-after-write" 

cmccabe-rgw-tests-2011-05-05.txt View (69.8 KB) Colin McCabe, 05/05/2011 10:36 AM


Subtasks

History

#1 Updated by Colin McCabe almost 13 years ago

I think this issue is causing a lot of the s3-tests failures I'm seeing.

#2 Updated by Yehuda Sadeh almost 13 years ago

Can you specify which tests are failing? I haven't seen this happening, but I was probably running it against a different cluster.

#3 Updated by Colin McCabe almost 13 years ago

Here is my run of s3-tests. Looks like these tests failed:

test_s3.test_bucket_list_empty
test_s3.test_bucket_create_delete
test_s3.test_bucket_list_long_name
test_s3.test_bucket_acl_grant_email

#4 Updated by Yehuda Sadeh almost 13 years ago

Other than one test that failed due to misconfiguration, the rest failed due to the async bucket creation. We should just make bucket creation synchronous again to avoid that.

#5 Updated by Sage Weil almost 13 years ago

  • Category set to 22
  • translation missing: en.field_story_points set to 2

IIRC we settled on:

- if pool is not in the osdmap, check for the bucket object. if that also doesn't exist, return NoSuchBucket (or whatever)

with the other option being:

- query the monitor if it's not in the osdmap.

Making creation itself sync won't help because you can still get another radosgw with a stale osdmap.

#6 Updated by Yehuda Sadeh almost 13 years ago

  • Status changed from New to Resolved

Fixed now, commit:f00edf73284fc0f6e32973d16f58eb81f7b96bf8. However, this might have impact on performance.

#7 Updated by John Spray about 6 years ago

  • Project changed from Ceph to rgw
  • Category deleted (22)
  • Target version deleted (v0.28)

Bulk reassign of radosgw category to RGW project.

Also available in: Atom PDF