Project

General

Profile

Actions

Bug #64340

open

invalid olh attributes on the target object after copy_object in a versioning suspended bucket

Added by Jane Zhu 3 months ago. Updated 2 months ago.

Status:
Pending Backport
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
versioning copy backport_processed
Backport:
quincy reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

During copy_object in a versioning suspended bucket, the olh attributes (user.rgw.olh.idtag, user.rgw.olh.info, user.rgw.olh.ver) of the source object are copied to the target object, which is causing some problems on the GET request of the target object.

An example problem in our case:
We deleted the source object after the copy, then sent a GET request to get the target object. The Get request crashed the rgw in our ceph version. The ceph main behaves a little bit differently. The GET request didn't crash the rgw, instead, it silently failed. No error code returned, but no content returned either. It looks like the follow_olh follows the wrong/invalid olh.

The steps to reproduce the issue in vstart:

$ aws --endpoint-url http://localhost:8000 s3 ls
$ aws --endpoint-url http://localhost:8000 s3 mb s3://bucket1

$ aws --endpoint-url http://localhost:8000 s3api put-bucket-versioning --bucket bucket1 --versioning-configuration Status=Enabled
$ aws --endpoint-url http://localhost:8000 s3api get-bucket-versioning --bucket bucket1

$ head -c 4K < /dev/urandom > file_4k

$ aws --endpoint-url http://localhost:8000 s3 cp file_4k s3://bucket1/file_4k
$ aws --endpoint-url http://localhost:8000 s3api put-bucket-versioning --bucket bucket1 --versioning-configuration Status=Suspended
$ aws --endpoint-url http://localhost:8000 s3 cp file_4k s3://bucket1/file_4k

$ python3.8 copy_from_other.py  <-- a script to copy the object from s3://bucket1/file_4k to s3://bucket1/broom/file_4k

$ aws --endpoint-url http://localhost:8000 s3 rm s3://bucket1/file_4k

$ aws --endpoint-url http://localhost:8000 s3 cp s3://bucket1/broom/file_4k bbb

$ ls -lrt bbb
-rw-r--r--. 1 root root 0 Feb  7 05:15 bbb

Proposed fix:
In my opinion, the copy_object shouldn't copy the olh attributes in versioning suspended bucket. I'll put in a PR shortly.


Related issues 2 (1 open1 closed)

Copied to rgw - Backport #64447: quincy: invalid olh attributes on the target object after copy_object in a versioning suspended bucketIn ProgressJane ZhuActions
Copied to rgw - Backport #64448: reef: invalid olh attributes on the target object after copy_object in a versioning suspended bucketResolvedJane ZhuActions
Actions #1

Updated by Jane Zhu 3 months ago

  • Status changed from New to Fix Under Review
  • Assignee set to Jane Zhu
  • Pull request ID set to 55486
Actions #2

Updated by Satoru Takeuchi 3 months ago

The Get request crashed the rgw in our ceph version.

In my understanding, this silent error only happens in main but the stable releases has a consistent crash problem on reading problematic objects, in your example, file_4k. If so, it's better to backport your PR to reef and quincy. Is my understanding correct?

Actions #3

Updated by Jane Zhu 3 months ago

Satoru Takeuchi wrote:

The Get request crashed the rgw in our ceph version.

In my understanding, this silent error only happens in main but the stable releases has a consistent crash problem on reading problematic objects, in your example, file_4k. If so, it's better to backport your PR to reef and quincy. Is my understanding correct?

The Ceph version we are running, where the crash happens, is not one of the stable releases. It's a main branch pre-reef. So I'm not very sure how it behaves in a stable quincy/reef release. I haven't got time to track down when the crash was introduced and fixed.
However, no matter which behavior it shows, I think we should backport the fix to quincy and reef.

Actions #4

Updated by Satoru Takeuchi 3 months ago

Thank you for your reply. I verified that consistent GET error happened with my v17.2.5 cluster.

Actions #5

Updated by Casey Bodley 3 months ago

  • Tags set to versioning copy
  • Backport set to quincy reef
Actions #6

Updated by Casey Bodley 2 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Backport Bot 2 months ago

  • Copied to Backport #64447: quincy: invalid olh attributes on the target object after copy_object in a versioning suspended bucket added
Actions #8

Updated by Backport Bot 2 months ago

  • Copied to Backport #64448: reef: invalid olh attributes on the target object after copy_object in a versioning suspended bucket added
Actions #9

Updated by Backport Bot 2 months ago

  • Tags changed from versioning copy to versioning copy backport_processed
Actions #10

Updated by Robin Johnson 2 months ago

Jane:
1.

$ python3.8 copy_from_other.py <-- a script to copy the object from s3://bucket1/file_4k to s3://bucket1/broom/file_4k

How did this script differ from the awscli command?
aws --endpoint-url http://localhost:8000 s3api copy-versioning --bucket bucket1 --key broom/file_4k --copy-source bucket1/file_4k

2. Did any s3-tests get created for this?

3. Does "radosgw-admin bucket check" detect this error state on the object?

Actions #11

Updated by Jane Zhu 2 months ago

Robin Johnson wrote:

Jane:
1.
How did this script differ from the awscli command?
aws --endpoint-url http://localhost:8000 s3api copy-versioning --bucket bucket1 --key broom/file_4k --copy-source bucket1/file_4k

Do you mean "copy-object"? There is no difference from my copy script in this context. We use the boto3 script just to exactly mimic our client behavior. Either way can reproduce the issue.

2. Did any s3-tests get created for this?

s3-tests PR: https://github.com/ceph/s3-tests/pull/546

3. Does "radosgw-admin bucket check" detect this error state on the object?

No

Actions

Also available in: Atom PDF