Project

General

Profile

Actions

Bug #63973

open

x-amz-expiration HTTP header: expiry-date sometimes broken

Added by Markus Schuster 4 months ago. Updated about 2 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
lifecycle backport_processed
Backport:
quincy reef squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We see a strange rgw issue: One of our custom applications (written in DotNet, using the MinIO S3 client lib) sometimes receives a broken expiry-date within the x-amz-expiration HTTP header. The bucket in question has a bucket lifecycle policy to expire objects after 7 days.

Here's an example HTTP request and reply:

HEAD https://s3.XXX.XXX.XXX/theBucket/b6755460-6e48-47a5-9695-98f2b12651e7/expiration.txt HTTP/1.1
Host: s3.XXX.XXX.XXX
x-amz-content-sha256: UNSIGNED-PAYLOAD
x-amz-date: 20240108T171707Z
Authorization: AWS4-HMAC-SHA256 Credential=XXXXXXXXXXXXXXXXXXXX/20240108/europe/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=XXXXX
Accept-Encoding: gzip,deflate,br
traceparent: 00-XXXXXXXXXXXXXXXXXXXX-00

HTTP/1.1 200 OK
Content-Length: 10
Last-Modified: Mon, 08 Jan 2024 17:17:05 GMT
Content-Type: text/plain
Accept-Ranges: bytes
x-amz-expiration: expiry-date="Tue, 09 Jan 2023 00:00:00 GMT", rule-id="Delete TUS upload fragments after 7 days and clean-up incomplete uploads" 
x-rgw-object-type: Normal
ETag: "XXX" 
x-amz-request-id: XXX
Date: Mon, 08 Jan 2024 17:17:07 GMT
Connection: Keep-Alive

Here we have multiple issues:
  • 09 Jan 2023 was not a Tuesday but a Monday
  • An expiry date one year in the past doesn't make any sense in this case
The expiry date can basically be anything, some examples from the same bucket:
  • x-amz-expiration: expiry-date="Tue, 31 Jul 2023 00:00:00 GMT", rule-id="Delete TUS upload fragments after 7 days and clean-up incomplete uploads"
  • x-amz-expiration: expiry-date="Fri, 16 Jun 2023 00:00:00 GMT", rule-id="Delete TUS upload fragments after 7 days and clean-up incomplete uploads"
  • x-amz-expiration: expiry-date="Tue, 21 Jun 2024 00:00:00 GMT", rule-id="Delete TUS upload fragments after 7 days and clean-up incomplete uploads"

But neither does it make sense nor is it a valid date. The issue came up because the MinIO S3 client lib does validate those dates.

The problem is not really reproducible for the same object. For example I checked multiple objects using aws s3api head-object and the x-amz-expiration header looked just nice. But it always comes up once our application issues enough (hundreds or maybe even thousands) requests, maybe in parallel. So right now I'd assume some multithreading issue - maybe the date function used is not thread-safe?

Tested on Ceph 16.2.9 and 16.2.14

Right now the workaround is to disable the bucket lifecycle policy so the x-amz-expiration is missing in the HTTP reply.


Files

rgw-debug.log (24.9 KB) rgw-debug.log radosgw debug log (level 10) Markus Schuster, 01/26/2024 04:35 PM

Related issues 3 (2 open1 closed)

Copied to rgw - Backport #64876: squid: x-amz-expiration HTTP header: expiry-date sometimes brokenResolvedCasey BodleyActions
Copied to rgw - Backport #64877: quincy: x-amz-expiration HTTP header: expiry-date sometimes brokenNewMatt BenjaminActions
Copied to rgw - Backport #64878: reef: x-amz-expiration HTTP header: expiry-date sometimes brokenNewMatt BenjaminActions
Actions #1

Updated by Casey Bodley 4 months ago

  • Assignee set to Matt Benjamin
  • Tags set to lifecycle
  • Backport set to quincy reef
Actions #2

Updated by Matt Benjamin 4 months ago

Hi Markus,

Do you have the ability to produce debug output (--debug-rgw=10), or possibly run a test fix?

regards,

Matt

Actions #3

Updated by Markus Schuster 4 months ago

Hi Matt,

I forgot to mention: The Ceph clusters in question are cephadm-based (so fully containerized).

Looks like I can get debug output by setting the cluster config "debug_rgw", so that will be possible. Have to coordinate that with my colleagues.

Running a test fix should be possible as we have a live/prod and a disaster recovery cluster - I could run the test fix in the later. If you can provide a ready-made container image it'll be very easy, otherwise I'll find a way :)

I'm on training course next week, so please don't expect a reply within next week.

Regards,
Markus

Actions #4

Updated by Markus Schuster 3 months ago

Hi Matt,

sorry it took so long, but this week I've finally been able to generate the needed debug log (attached).
I found it very hard to filter the rgw logs as the system in question is part of a productive environment, so there's quite some replication traffic and no clear request identifier I could just grep for. I simply took everything between starting new request and req done - but it looks like there is replication traffic in between.

And here's the HTTP request/response as seen by the application. No need to obfuscate anything as the debug log is pretty extensive…

- - - - - - - - - - BEGIN REQUEST - - - - - - - - - -
HEAD https://s3.ams3.srv.xxx/transcodeit-tus-temp/58f14c41-1425-4606-8ca7-c6b7f1423945/uploadlength.txt
Host: s3.ams3.srv.xxx
x-amz-content-sha256: UNSIGNED-PAYLOAD
x-amz-date: 20240125T154926Z
Authorization: AWS4-HMAC-SHA256 Credential=AM9QC9KV786UCK6PV4DX/20240125/europe/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=b29a3ca43d40a227c6a742bd8af8af3313af49d53f1fee768a74558d8c0e8285
Accept-Encoding: gzip,deflate,br
traceparent: 00-1906ad3ebbbce61615708cad1712a13b-f4fda21120485a3d-00
- - - - - - - - - - END REQUEST - - - - - - - - - -

- - - - - - - - - - BEGIN RESPONSE - - - - - - - - - -
HTTP/1.1 200 OK
Content-Length: 9
Last-Modified: Thu, 25 Jan 2024 12:51:32 GMT
Content-Type: text/plain
Accept-Ranges: bytes
x-amz-expiration: expiry-date="Thu, 02 Feb 2024 00:49:26 GMT", rule-id="Delete TUS upload fragments after 7 days and clean-up incomplete uploads" 
x-rgw-object-type: Normal
ETag: "e009368ea33a4e5e32d2d8ac7aabb8a3" 
x-amz-request-id: tx0000074b032c664ec00d0-0065b28306-288cb-ams-3
Date: Thu, 25 Jan 2024 15:49:26 GMT
Connection: Keep-Alive
- - - - - - - - - - END RESPONSE - - - - - - - - - -
Request completed in 231.9546 ms
Actions #5

Updated by Matt Benjamin 3 months ago

Thanks, Markus.

I think your theory of the root cause is plausible, we/I will propose a fix.

Matt

Actions #6

Updated by Matt Benjamin 3 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 55644
Actions #7

Updated by Casey Bodley about 2 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from quincy reef to quincy reef squid
Actions #8

Updated by Backport Bot about 2 months ago

  • Copied to Backport #64876: squid: x-amz-expiration HTTP header: expiry-date sometimes broken added
Actions #9

Updated by Backport Bot about 2 months ago

  • Copied to Backport #64877: quincy: x-amz-expiration HTTP header: expiry-date sometimes broken added
Actions #10

Updated by Backport Bot about 2 months ago

  • Copied to Backport #64878: reef: x-amz-expiration HTTP header: expiry-date sometimes broken added
Actions #11

Updated by Backport Bot about 2 months ago

  • Tags changed from lifecycle to lifecycle backport_processed
Actions

Also available in: Atom PDF