Bug #63973
openx-amz-expiration HTTP header: expiry-date sometimes broken
0%
Description
We see a strange rgw issue: One of our custom applications (written in DotNet, using the MinIO S3 client lib) sometimes receives a broken expiry-date
within the x-amz-expiration
HTTP header. The bucket in question has a bucket lifecycle policy to expire objects after 7 days.
Here's an example HTTP request and reply:
HEAD https://s3.XXX.XXX.XXX/theBucket/b6755460-6e48-47a5-9695-98f2b12651e7/expiration.txt HTTP/1.1 Host: s3.XXX.XXX.XXX x-amz-content-sha256: UNSIGNED-PAYLOAD x-amz-date: 20240108T171707Z Authorization: AWS4-HMAC-SHA256 Credential=XXXXXXXXXXXXXXXXXXXX/20240108/europe/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=XXXXX Accept-Encoding: gzip,deflate,br traceparent: 00-XXXXXXXXXXXXXXXXXXXX-00 HTTP/1.1 200 OK Content-Length: 10 Last-Modified: Mon, 08 Jan 2024 17:17:05 GMT Content-Type: text/plain Accept-Ranges: bytes x-amz-expiration: expiry-date="Tue, 09 Jan 2023 00:00:00 GMT", rule-id="Delete TUS upload fragments after 7 days and clean-up incomplete uploads" x-rgw-object-type: Normal ETag: "XXX" x-amz-request-id: XXX Date: Mon, 08 Jan 2024 17:17:07 GMT Connection: Keep-AliveHere we have multiple issues:
- 09 Jan 2023 was not a Tuesday but a Monday
- An expiry date one year in the past doesn't make any sense in this case
x-amz-expiration: expiry-date="Tue, 31 Jul 2023 00:00:00 GMT", rule-id="Delete TUS upload fragments after 7 days and clean-up incomplete uploads"
x-amz-expiration: expiry-date="Fri, 16 Jun 2023 00:00:00 GMT", rule-id="Delete TUS upload fragments after 7 days and clean-up incomplete uploads"
x-amz-expiration: expiry-date="Tue, 21 Jun 2024 00:00:00 GMT", rule-id="Delete TUS upload fragments after 7 days and clean-up incomplete uploads"
But neither does it make sense nor is it a valid date. The issue came up because the MinIO S3 client lib does validate those dates.
The problem is not really reproducible for the same object. For example I checked multiple objects using aws s3api head-object
and the x-amz-expiration
header looked just nice. But it always comes up once our application issues enough (hundreds or maybe even thousands) requests, maybe in parallel. So right now I'd assume some multithreading issue - maybe the date function used is not thread-safe?
Tested on Ceph 16.2.9 and 16.2.14
Right now the workaround is to disable the bucket lifecycle policy so the x-amz-expiration
is missing in the HTTP reply.
Files
Updated by Casey Bodley 4 months ago
- Assignee set to Matt Benjamin
- Tags set to lifecycle
- Backport set to quincy reef
Updated by Matt Benjamin 4 months ago
Hi Markus,
Do you have the ability to produce debug output (--debug-rgw=10), or possibly run a test fix?
regards,
Matt
Updated by Markus Schuster 4 months ago
Hi Matt,
I forgot to mention: The Ceph clusters in question are cephadm-based (so fully containerized).
Looks like I can get debug output by setting the cluster config "debug_rgw", so that will be possible. Have to coordinate that with my colleagues.
Running a test fix should be possible as we have a live/prod and a disaster recovery cluster - I could run the test fix in the later. If you can provide a ready-made container image it'll be very easy, otherwise I'll find a way :)
I'm on training course next week, so please don't expect a reply within next week.
Regards,
Markus
Updated by Markus Schuster 3 months ago
- File rgw-debug.log rgw-debug.log added
Hi Matt,
sorry it took so long, but this week I've finally been able to generate the needed debug log (attached).
I found it very hard to filter the rgw logs as the system in question is part of a productive environment, so there's quite some replication traffic and no clear request identifier I could just grep
for. I simply took everything between starting new request
and req done
- but it looks like there is replication traffic in between.
And here's the HTTP request/response as seen by the application. No need to obfuscate anything as the debug log is pretty extensive…
- - - - - - - - - - BEGIN REQUEST - - - - - - - - - - HEAD https://s3.ams3.srv.xxx/transcodeit-tus-temp/58f14c41-1425-4606-8ca7-c6b7f1423945/uploadlength.txt Host: s3.ams3.srv.xxx x-amz-content-sha256: UNSIGNED-PAYLOAD x-amz-date: 20240125T154926Z Authorization: AWS4-HMAC-SHA256 Credential=AM9QC9KV786UCK6PV4DX/20240125/europe/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=b29a3ca43d40a227c6a742bd8af8af3313af49d53f1fee768a74558d8c0e8285 Accept-Encoding: gzip,deflate,br traceparent: 00-1906ad3ebbbce61615708cad1712a13b-f4fda21120485a3d-00 - - - - - - - - - - END REQUEST - - - - - - - - - - - - - - - - - - - - BEGIN RESPONSE - - - - - - - - - - HTTP/1.1 200 OK Content-Length: 9 Last-Modified: Thu, 25 Jan 2024 12:51:32 GMT Content-Type: text/plain Accept-Ranges: bytes x-amz-expiration: expiry-date="Thu, 02 Feb 2024 00:49:26 GMT", rule-id="Delete TUS upload fragments after 7 days and clean-up incomplete uploads" x-rgw-object-type: Normal ETag: "e009368ea33a4e5e32d2d8ac7aabb8a3" x-amz-request-id: tx0000074b032c664ec00d0-0065b28306-288cb-ams-3 Date: Thu, 25 Jan 2024 15:49:26 GMT Connection: Keep-Alive - - - - - - - - - - END RESPONSE - - - - - - - - - - Request completed in 231.9546 ms
Updated by Matt Benjamin 3 months ago
Thanks, Markus.
I think your theory of the root cause is plausible, we/I will propose a fix.
Matt
Updated by Matt Benjamin 3 months ago
- Status changed from New to Fix Under Review
- Pull request ID set to 55644
Updated by Casey Bodley about 2 months ago
- Status changed from Fix Under Review to Pending Backport
- Backport changed from quincy reef to quincy reef squid
Updated by Backport Bot about 2 months ago
- Copied to Backport #64876: squid: x-amz-expiration HTTP header: expiry-date sometimes broken added
Updated by Backport Bot about 2 months ago
- Copied to Backport #64877: quincy: x-amz-expiration HTTP header: expiry-date sometimes broken added
Updated by Backport Bot about 2 months ago
- Copied to Backport #64878: reef: x-amz-expiration HTTP header: expiry-date sometimes broken added
Updated by Backport Bot about 2 months ago
- Tags changed from lifecycle to lifecycle backport_processed