Bug #64340: invalid olh attributes on the target object after copy_object in a versioning suspended bucket - rgw - Ceph

Actions

Copy link

Bug #64340

open

invalid olh attributes on the target object after copy_object in a versioning suspended bucket

Added by Jane Zhu 3 months ago. Updated 2 months ago.

Status:

Pending Backport

Priority:

High

Assignee:

Jane Zhu

Target version:

% Done:

Source:

Tags:

versioning copy backport_processed

Backport:

quincy reef

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

55486

Crash signature (v1):

Crash signature (v2):

Description

During copy_object in a versioning suspended bucket, the olh attributes (user.rgw.olh.idtag, user.rgw.olh.info, user.rgw.olh.ver) of the source object are copied to the target object, which is causing some problems on the GET request of the target object.

An example problem in our case:
We deleted the source object after the copy, then sent a GET request to get the target object. The Get request crashed the rgw in our ceph version. The ceph main behaves a little bit differently. The GET request didn't crash the rgw, instead, it silently failed. No error code returned, but no content returned either. It looks like the follow_olh follows the wrong/invalid olh.

The steps to reproduce the issue in vstart:

$ aws --endpoint-url http://localhost:8000 s3 ls
$ aws --endpoint-url http://localhost:8000 s3 mb s3://bucket1

$ aws --endpoint-url http://localhost:8000 s3api put-bucket-versioning --bucket bucket1 --versioning-configuration Status=Enabled
$ aws --endpoint-url http://localhost:8000 s3api get-bucket-versioning --bucket bucket1

$ head -c 4K < /dev/urandom > file_4k

$ aws --endpoint-url http://localhost:8000 s3 cp file_4k s3://bucket1/file_4k
$ aws --endpoint-url http://localhost:8000 s3api put-bucket-versioning --bucket bucket1 --versioning-configuration Status=Suspended
$ aws --endpoint-url http://localhost:8000 s3 cp file_4k s3://bucket1/file_4k

$ python3.8 copy_from_other.py  <-- a script to copy the object from s3://bucket1/file_4k to s3://bucket1/broom/file_4k

$ aws --endpoint-url http://localhost:8000 s3 rm s3://bucket1/file_4k

$ aws --endpoint-url http://localhost:8000 s3 cp s3://bucket1/broom/file_4k bbb

$ ls -lrt bbb
-rw-r--r--. 1 root root 0 Feb  7 05:15 bbb

Proposed fix:
In my opinion, the copy_object shouldn't copy the olh attributes in versioning suspended bucket. I'll put in a PR shortly.

Related issues 2 (1 open — 1 closed)

Actions

Copy link

Updated by Jane Zhu 3 months ago

Status changed from New to Fix Under Review
Assignee set to Jane Zhu
Pull request ID set to 55486

PR: https://github.com/ceph/ceph/pull/55486

Actions

Copy link

Updated by Satoru Takeuchi 3 months ago

The Get request crashed the rgw in our ceph version.

In my understanding, this silent error only happens in main but the stable releases has a consistent crash problem on reading problematic objects, in your example, file_4k. If so, it's better to backport your PR to reef and quincy. Is my understanding correct?

Actions

Copy link

Updated by Jane Zhu 3 months ago

Satoru Takeuchi wrote:

The Get request crashed the rgw in our ceph version.

In my understanding, this silent error only happens in main but the stable releases has a consistent crash problem on reading problematic objects, in your example, file_4k. If so, it's better to backport your PR to reef and quincy. Is my understanding correct?

The Ceph version we are running, where the crash happens, is not one of the stable releases. It's a main branch pre-reef. So I'm not very sure how it behaves in a stable quincy/reef release. I haven't got time to track down when the crash was introduced and fixed.
However, no matter which behavior it shows, I think we should backport the fix to quincy and reef.

Actions

Copy link

Updated by Satoru Takeuchi 3 months ago

Thank you for your reply. I verified that consistent GET error happened with my v17.2.5 cluster.

Actions

Copy link

Updated by Casey Bodley 3 months ago

Tags set to versioning copy
Backport set to quincy reef

Actions

Copy link

Updated by Casey Bodley 2 months ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

Updated by Backport Bot 2 months ago

Copied to Backport #64447: quincy: invalid olh attributes on the target object after copy_object in a versioning suspended bucket added

Actions

Copy link

Updated by Backport Bot 2 months ago

Copied to Backport #64448: reef: invalid olh attributes on the target object after copy_object in a versioning suspended bucket added

Actions

Copy link

Updated by Backport Bot 2 months ago

Tags changed from versioning copy to versioning copy backport_processed

Actions

Copy link

#10

Updated by Robin Johnson 2 months ago

Jane:
1.

$ python3.8 copy_from_other.py <-- a script to copy the object from s3://bucket1/file_4k to s3://bucket1/broom/file_4k

How did this script differ from the awscli command?
aws --endpoint-url http://localhost:8000 s3api copy-versioning --bucket bucket1 --key broom/file_4k --copy-source bucket1/file_4k

2. Did any s3-tests get created for this?

3. Does "radosgw-admin bucket check" detect this error state on the object?

Actions

Copy link

#11

Updated by Jane Zhu 2 months ago

Robin Johnson wrote:

Jane:
1.
How did this script differ from the awscli command?
aws --endpoint-url http://localhost:8000 s3api copy-versioning --bucket bucket1 --key broom/file_4k --copy-source bucket1/file_4k

Do you mean "copy-object"? There is no difference from my copy script in this context. We use the boto3 script just to exactly mimic our client behavior. Either way can reproduce the issue.

2. Did any s3-tests get created for this?

s3-tests PR: https://github.com/ceph/s3-tests/pull/546

3. Does "radosgw-admin bucket check" detect this error state on the object?

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #64340

invalid olh attributes on the target object after copy_object in a versioning suspended bucket

Updated by Jane Zhu 3 months ago

Updated by Satoru Takeuchi 3 months ago

Updated by Jane Zhu 3 months ago

Updated by Satoru Takeuchi 3 months ago

Updated by Casey Bodley 3 months ago

Updated by Casey Bodley 2 months ago

Updated by Backport Bot 2 months ago

Updated by Backport Bot 2 months ago

Updated by Backport Bot 2 months ago

Updated by Robin Johnson 2 months ago

Updated by Jane Zhu 2 months ago