Project

General

Profile

Actions

Bug #64366

open

rgw/multisite: objects named "." or ".." are not replicated

Added by Oguzhan Ozmen 3 months ago. Updated 2 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
rgw multisite backport_processed
Backport:
quincy reef squid
Regression:
No
Severity:
4 - irritation
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Perhaps a known issue but couldn't find this in the tracker. As the title suggests, if a user uploads an object with the key "." (i.e., a single dot/period character) or "..", it won't get replicated to the secondary zone. Other cases, like a key more than 2 dots or any other key starts with dot (e.g., ".file"), are fine.

It's easily reproducible:

$ aws s3api put-object --key=. --bucket=<bucket> --body <file>
$ aws s3api put-object --key=.. --bucket=<bucket> --body <file>

create the objects properly on the primary site but these objects won't get replicated to the secondary site. User can download the objects from the primary site using aws cli with no issues.

Secondary site would emit events like

...RGW-SYNC:data:sync:shard[..] ... entry[.]: ERROR: failed to sync object: <bucket_instance>:<datalog_shard>/.

and sync status would show the impacted shard as recovering

$ sudo radosgw-admin sync status
  ...
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: ...
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        1 shards are recovering
                        recovering shards: [90]

After some further investigation, I believe this is an issue with the curl library ceph uses so not directly a ceph issue. It can even be reproduced using curl tool as well:

# curl strips of single dot in the end in the uri
$ curl http://<rgw_ip>/.
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>

# curl strips of double dots in the end in the uri
$ curl http://127.0.0.1:8101/..
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>

3 dots are not removed, though:

$ curl http://<rgw_ip>/...
<?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message></Message><BucketName>...</BucketName><RequestId>tx000000e4e811a4ec2317c-0065c53177-4183-zg1-1</RequestId><HostId>4183-zg1-1-zg1</HostId></Error>

https://github.com/curl/curl/issues/716 seems to be describing the ~same issue.

I was initially suspicious that ceph would be trimming off the "dot" character (either at the sender/client side or at the receiver/master side) but uri is correctly formed before handing it off the curl library for delivery and tcp dump shows that "dot" char is trimmed off by the sender side.

(gdb) n                                                                                                                 409       r = (*req)->send(nullptr);                                                                                                                (gdb) list 409
404
405       if (!send) {
406         return 0;
407       }
408
409       r = (*req)->send(nullptr);
410       if (r < 0) {
411         goto done_err;
412       }
413       return 0;

(gdb) p (*req)->url
$172 = "http://localhost:8101/u1b1/.?rgwx-zonegroup=73adcd35-2eb3-4189-aabb-e54e03e14376&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=4afef25"...

tcpdump shows that "." is dropped at the delivery:

CÐ:<93>CÐ^_(GET /u1b1/?rgwx-zonegroup=73adcd35-2eb3-4189-aabb-e54e03e14376&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=4afef25f-55b0-4e1e-af93-88bc18eae9f3%3Au1b1%3A44ada735-7336-4e43-b03d-2d0840e947de.4183.1 HTTP/1.1^M
Host: localhost:8101^M
Accept: */*^M
Authorization: AWS4-HMAC-SHA256 Credential=1234567890/20240208/zg1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=9111ecebfdb73a110b3d55446323dc326d96f0f62c3e4d8609662d239964e02e^M
Date: Thu, 08 Feb 2024 17:16:10 +0000^M
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855^M
X-Amz-Date: 20240208T171610Z^M

Related issues 3 (2 open1 closed)

Copied to rgw - Backport #64550: quincy: rgw/multisite: objects named "." or ".." are not replicatedNewActions
Copied to rgw - Backport #64551: reef: rgw/multisite: objects named "." or ".." are not replicatedNewActions
Copied to rgw - Backport #64552: squid: rgw/multisite: objects named "." or ".." are not replicatedResolvedCasey BodleyActions
Actions #1

Updated by Casey Bodley 3 months ago

  • Backport set to quincy reef squid

Oguzhan Ozmen wrote:

https://github.com/curl/curl/issues/716 seems to be describing the ~same issue.

that doesn't quite look the same, since it's removing dots from http host header. here it's removing dots from the url's path

i found https://github.com/curl/curl/issues/3901 which looks more like our bug. badger recommends adding --path-as-is to disable path normalization there, which would correspond to the libcurl option https://curl.se/libcurl/c/CURLOPT_PATH_AS_IS.html. adding that should fix the issue?

this path normalization would be fine for most rest APIs, but the S3 api does not normalize paths. there's a note about why You do not normalize URI paths for requests to Amazon S3. in https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html#canonical-request, so it's important that curl sends the exact same path that we use to sign the request

Actions #2

Updated by Oguzhan Ozmen 3 months ago

Good catch!

Yes, with `--path-as-is` option, curl tool won't normalize the path:

## normalizes
$ curl http://127.0.0.1:8101/..
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>

## doesn't trim the dots
$ curl http://127.0.0.1:8101/.. --path-as-is
<?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message></Message><BucketName>..</BucketName><RequestId>tx0000096f30fcd835b2047-0065cb8664-4183-zg1-1</RequestId><HostId>4183-zg1-1-zg1</HostId></Error>

I think the fix would be as easy as below...Just tested it and "." and ".." files are replicated properly:

@@ -591,6 +591,8 @@ int RGWHTTPClient::init_request(rgw_http_req_data *_req_data)
   curl_easy_setopt(easy_handle, CURLOPT_READFUNCTION, send_http_data);
   curl_easy_setopt(easy_handle, CURLOPT_READDATA, (void *)req_data);
   curl_easy_setopt(easy_handle, CURLOPT_BUFFERSIZE, cct->_conf->rgw_curl_buffersize);
+  curl_easy_setopt(easy_handle, CURLOPT_PATH_AS_IS, 1L);
+

This is where we craft the curl object to be sent.

Actions #3

Updated by Oguzhan Ozmen 3 months ago

Added https://github.com/ceph/ceph/pull/55565 as WIP to further discuss potential solutions.
Perhaps, a multisite testcase would be added as well.

Actions #4

Updated by Casey Bodley 3 months ago

  • Status changed from New to In Progress
  • Assignee set to Oguzhan Ozmen
  • Pull request ID set to 55565
Actions #5

Updated by Oguzhan Ozmen 2 months ago

The existing integration test case test_multi.py:test_object_sync is updated to reproduce the issue. Objects with keys including dot character are added to the test including the keys "." and "..". Without the proposed fix, objects "." and ".." are not replicated and the test fails (times out). After adding CURLOPT_PATH_AS_IS to the client http request, these objects are replicated and the test passes.

Actions #6

Updated by Casey Bodley 2 months ago

  • Status changed from In Progress to Fix Under Review
Actions #7

Updated by Casey Bodley 2 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #8

Updated by Backport Bot 2 months ago

  • Copied to Backport #64550: quincy: rgw/multisite: objects named "." or ".." are not replicated added
Actions #9

Updated by Backport Bot 2 months ago

  • Copied to Backport #64551: reef: rgw/multisite: objects named "." or ".." are not replicated added
Actions #10

Updated by Backport Bot 2 months ago

  • Copied to Backport #64552: squid: rgw/multisite: objects named "." or ".." are not replicated added
Actions #11

Updated by Backport Bot 2 months ago

  • Tags changed from rgw multisite to rgw multisite backport_processed
Actions

Also available in: Atom PDF