Bug #64366
openrgw/multisite: objects named "." or ".." are not replicated
0%
Description
Perhaps a known issue but couldn't find this in the tracker. As the title suggests, if a user uploads an object with the key "." (i.e., a single dot/period character) or "..", it won't get replicated to the secondary zone. Other cases, like a key more than 2 dots or any other key starts with dot (e.g., ".file"), are fine.
It's easily reproducible:
$ aws s3api put-object --key=. --bucket=<bucket> --body <file> $ aws s3api put-object --key=.. --bucket=<bucket> --body <file>
create the objects properly on the primary site but these objects won't get replicated to the secondary site. User can download the objects from the primary site using aws cli with no issues.
Secondary site would emit events like
...RGW-SYNC:data:sync:shard[..] ... entry[.]: ERROR: failed to sync object: <bucket_instance>:<datalog_shard>/.
and sync status would show the impacted shard as recovering
$ sudo radosgw-admin sync status ... metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: ... syncing full sync: 0/128 shards incremental sync: 128/128 shards 1 shards are recovering recovering shards: [90]
After some further investigation, I believe this is an issue with the curl library ceph uses so not directly a ceph issue. It can even be reproduced using curl tool as well:
# curl strips of single dot in the end in the uri $ curl http://<rgw_ip>/. <?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult> # curl strips of double dots in the end in the uri $ curl http://127.0.0.1:8101/.. <?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult> 3 dots are not removed, though: $ curl http://<rgw_ip>/... <?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message></Message><BucketName>...</BucketName><RequestId>tx000000e4e811a4ec2317c-0065c53177-4183-zg1-1</RequestId><HostId>4183-zg1-1-zg1</HostId></Error>
https://github.com/curl/curl/issues/716 seems to be describing the ~same issue.
I was initially suspicious that ceph would be trimming off the "dot" character (either at the sender/client side or at the receiver/master side) but uri is correctly formed before handing it off the curl library for delivery and tcp dump shows that "dot" char is trimmed off by the sender side.
(gdb) n 409 r = (*req)->send(nullptr); (gdb) list 409 404 405 if (!send) { 406 return 0; 407 } 408 409 r = (*req)->send(nullptr); 410 if (r < 0) { 411 goto done_err; 412 } 413 return 0; (gdb) p (*req)->url $172 = "http://localhost:8101/u1b1/.?rgwx-zonegroup=73adcd35-2eb3-4189-aabb-e54e03e14376&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=4afef25"...
tcpdump shows that "." is dropped at the delivery:
CÐ:<93>CÐ^_(GET /u1b1/?rgwx-zonegroup=73adcd35-2eb3-4189-aabb-e54e03e14376&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=4afef25f-55b0-4e1e-af93-88bc18eae9f3%3Au1b1%3A44ada735-7336-4e43-b03d-2d0840e947de.4183.1 HTTP/1.1^M Host: localhost:8101^M Accept: */*^M Authorization: AWS4-HMAC-SHA256 Credential=1234567890/20240208/zg1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=9111ecebfdb73a110b3d55446323dc326d96f0f62c3e4d8609662d239964e02e^M Date: Thu, 08 Feb 2024 17:16:10 +0000^M X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855^M X-Amz-Date: 20240208T171610Z^M
Updated by Casey Bodley 3 months ago
- Backport set to quincy reef squid
Oguzhan Ozmen wrote:
https://github.com/curl/curl/issues/716 seems to be describing the ~same issue.
that doesn't quite look the same, since it's removing dots from http host header. here it's removing dots from the url's path
i found https://github.com/curl/curl/issues/3901 which looks more like our bug. badger recommends adding --path-as-is
to disable path normalization there, which would correspond to the libcurl option https://curl.se/libcurl/c/CURLOPT_PATH_AS_IS.html. adding that should fix the issue?
this path normalization would be fine for most rest APIs, but the S3 api does not normalize paths. there's a note about why You do not normalize URI paths for requests to Amazon S3.
in https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html#canonical-request, so it's important that curl sends the exact same path that we use to sign the request
Updated by Oguzhan Ozmen 3 months ago
Good catch!
Yes, with `--path-as-is` option, curl tool won't normalize the path:
## normalizes $ curl http://127.0.0.1:8101/.. <?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult> ## doesn't trim the dots $ curl http://127.0.0.1:8101/.. --path-as-is <?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message></Message><BucketName>..</BucketName><RequestId>tx0000096f30fcd835b2047-0065cb8664-4183-zg1-1</RequestId><HostId>4183-zg1-1-zg1</HostId></Error>
I think the fix would be as easy as below...Just tested it and "." and ".." files are replicated properly:
@@ -591,6 +591,8 @@ int RGWHTTPClient::init_request(rgw_http_req_data *_req_data)
curl_easy_setopt(easy_handle, CURLOPT_READFUNCTION, send_http_data);
curl_easy_setopt(easy_handle, CURLOPT_READDATA, (void *)req_data);
curl_easy_setopt(easy_handle, CURLOPT_BUFFERSIZE, cct->_conf->rgw_curl_buffersize);
+ curl_easy_setopt(easy_handle, CURLOPT_PATH_AS_IS, 1L);
+
This is where we craft the curl object to be sent.
Updated by Oguzhan Ozmen 3 months ago
Added https://github.com/ceph/ceph/pull/55565 as WIP to further discuss potential solutions.
Perhaps, a multisite testcase would be added as well.
Updated by Casey Bodley 3 months ago
- Status changed from New to In Progress
- Assignee set to Oguzhan Ozmen
- Pull request ID set to 55565
Updated by Oguzhan Ozmen 2 months ago
The existing integration test case test_multi.py:test_object_sync is updated to reproduce the issue. Objects with keys including dot character are added to the test including the keys "." and "..". Without the proposed fix, objects "." and ".." are not replicated and the test fails (times out). After adding CURLOPT_PATH_AS_IS to the client http request, these objects are replicated and the test passes.
Updated by Casey Bodley 2 months ago
- Status changed from In Progress to Fix Under Review
Updated by Casey Bodley 2 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot 2 months ago
- Copied to Backport #64550: quincy: rgw/multisite: objects named "." or ".." are not replicated added
Updated by Backport Bot 2 months ago
- Copied to Backport #64551: reef: rgw/multisite: objects named "." or ".." are not replicated added
Updated by Backport Bot 2 months ago
- Copied to Backport #64552: squid: rgw/multisite: objects named "." or ".." are not replicated added
Updated by Backport Bot 2 months ago
- Tags changed from rgw multisite to rgw multisite backport_processed