Bug #64340
openinvalid olh attributes on the target object after copy_object in a versioning suspended bucket
0%
Description
During copy_object in a versioning suspended bucket, the olh attributes (user.rgw.olh.idtag, user.rgw.olh.info, user.rgw.olh.ver) of the source object are copied to the target object, which is causing some problems on the GET request of the target object.
An example problem in our case:
We deleted the source object after the copy, then sent a GET request to get the target object. The Get request crashed the rgw in our ceph version. The ceph main behaves a little bit differently. The GET request didn't crash the rgw, instead, it silently failed. No error code returned, but no content returned either. It looks like the follow_olh follows the wrong/invalid olh.
The steps to reproduce the issue in vstart:
$ aws --endpoint-url http://localhost:8000 s3 ls $ aws --endpoint-url http://localhost:8000 s3 mb s3://bucket1 $ aws --endpoint-url http://localhost:8000 s3api put-bucket-versioning --bucket bucket1 --versioning-configuration Status=Enabled $ aws --endpoint-url http://localhost:8000 s3api get-bucket-versioning --bucket bucket1 $ head -c 4K < /dev/urandom > file_4k $ aws --endpoint-url http://localhost:8000 s3 cp file_4k s3://bucket1/file_4k $ aws --endpoint-url http://localhost:8000 s3api put-bucket-versioning --bucket bucket1 --versioning-configuration Status=Suspended $ aws --endpoint-url http://localhost:8000 s3 cp file_4k s3://bucket1/file_4k $ python3.8 copy_from_other.py <-- a script to copy the object from s3://bucket1/file_4k to s3://bucket1/broom/file_4k $ aws --endpoint-url http://localhost:8000 s3 rm s3://bucket1/file_4k $ aws --endpoint-url http://localhost:8000 s3 cp s3://bucket1/broom/file_4k bbb $ ls -lrt bbb -rw-r--r--. 1 root root 0 Feb 7 05:15 bbb
Proposed fix:
In my opinion, the copy_object shouldn't copy the olh attributes in versioning suspended bucket. I'll put in a PR shortly.
Updated by Satoru Takeuchi 3 months ago
The Get request crashed the rgw in our ceph version.
In my understanding, this silent error only happens in main but the stable releases has a consistent crash problem on reading problematic objects, in your example, file_4k. If so, it's better to backport your PR to reef and quincy. Is my understanding correct?
Updated by Jane Zhu 3 months ago
Satoru Takeuchi wrote:
The Get request crashed the rgw in our ceph version.
In my understanding, this silent error only happens in main but the stable releases has a consistent crash problem on reading problematic objects, in your example, file_4k. If so, it's better to backport your PR to reef and quincy. Is my understanding correct?
The Ceph version we are running, where the crash happens, is not one of the stable releases. It's a main branch pre-reef. So I'm not very sure how it behaves in a stable quincy/reef release. I haven't got time to track down when the crash was introduced and fixed.
However, no matter which behavior it shows, I think we should backport the fix to quincy and reef.
Updated by Satoru Takeuchi 3 months ago
Thank you for your reply. I verified that consistent GET error happened with my v17.2.5 cluster.
Updated by Casey Bodley 3 months ago
- Tags set to versioning copy
- Backport set to quincy reef
Updated by Casey Bodley 2 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot 2 months ago
- Copied to Backport #64447: quincy: invalid olh attributes on the target object after copy_object in a versioning suspended bucket added
Updated by Backport Bot 2 months ago
- Copied to Backport #64448: reef: invalid olh attributes on the target object after copy_object in a versioning suspended bucket added
Updated by Backport Bot 2 months ago
- Tags changed from versioning copy to versioning copy backport_processed
Updated by Robin Johnson 2 months ago
Jane:
1.
$ python3.8 copy_from_other.py <-- a script to copy the object from s3://bucket1/file_4k to s3://bucket1/broom/file_4k
How did this script differ from the awscli command?
aws --endpoint-url http://localhost:8000 s3api copy-versioning --bucket bucket1 --key broom/file_4k --copy-source bucket1/file_4k
2. Did any s3-tests get created for this?
3. Does "radosgw-admin bucket check" detect this error state on the object?
Updated by Jane Zhu 2 months ago
Robin Johnson wrote:
Jane:
1.
How did this script differ from the awscli command?
aws --endpoint-url http://localhost:8000 s3api copy-versioning --bucket bucket1 --key broom/file_4k --copy-source bucket1/file_4k
Do you mean "copy-object"? There is no difference from my copy script in this context. We use the boto3 script just to exactly mimic our client behavior. Either way can reproduce the issue.
2. Did any s3-tests get created for this?
s3-tests PR: https://github.com/ceph/s3-tests/pull/546
3. Does "radosgw-admin bucket check" detect this error state on the object?
No