Project

General

Profile

Actions

Bug #59663

closed

rgw: expired delete markers created by deleting non-existant object multiple times are not being removed from data pool after deletion from bucket

Added by Dmitry Bobarykin 12 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
rgw lifecycle versioning
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

As per https://github.com/ceph/ceph/pull/48400
Deleting a non-existent object generates a delete marker for that object, which is being removed by either ExpiredObjectDeleteMarker lifecycle or bucket deletion.
However, for some reason sending a delete request twice or more makes it so when you delete that marker from the bucket or delete bucket itself - marker object in data pool persists.

1)
s3cmd del s3://testbucket/1.file
s3cmd del s3://testbucket/1.file

2)
aws --endpoint-url http://192.168.1.25 s3api list-object-versions --bucket=testbucket {
"DeleteMarkers": [ {
"Owner": {
"DisplayName": "test1",
"ID": "test1"
},
"Key": "1.file",
"VersionId": "null",
"IsLatest": true,
"LastModified": "2023-05-06T17:41:28.967Z"

3)
radosgw-admin bucket stats --bucket=testbucket
...
"id": "c4316f2e-bb76-4e37-b65a-6e050cee258e.544742.1"

4)
rados -p default.rgw.buckets.data ls | grep '544742.1'
c4316f2e-bb76-4e37-b65a-6e050cee258e.544742.1_1.file

5)
after 'radosgw-admin lc process' with ExpiredObjectDeleteMarker lifecycle set up (or bucket removal) - marker is removed from bucket
aws --endpoint-url http://192.168.1.25 s3api list-object-versions --bucket=testbucket
[]
6) marker persists in bucket index and data pool
radosgw-admin bi list --bucket='testbucket'
...
"idx": "1.file"
...
rados -p default.rgw.buckets.data ls | grep '544742.1'
c4316f2e-bb76-4e37-b65a-6e050cee258e.544742.1_1.file

7) no objects listed for garbage collection
radosgw-admin gc list --include-all
[]


Related issues 2 (0 open2 closed)

Related to rgw - Bug #61359: Consistency bugs with OLH objectsResolvedCory Snyder

Actions
Related to rgw - Bug #59164: LC rules cause latency spikesCan't reproduce

Actions
Actions #1

Updated by Cory Snyder 11 months ago

  • Assignee set to Cory Snyder

I've noticed this issue as well. What is happening is that the second delete op intentionally exits early from rgw_bucket_link_olh [1]. This error condition causes the pending xattr on the OLH object to not get cleaned up. When the LC processor deletes the actual delete marker instance, we fail to remove the plain index entry, the OLH index entry, and the OLH RADOS object due to this outstanding pending xattr [2]. Note that it isn't the actual delete marker that remains after this scenario plays out, but just extra book-keeping baggage.

[1] https://github.com/ceph/ceph/blob/e0a40880cc0542197ec240090064f70becef918d/src/cls/rgw/cls_rgw.cc#L1778-L1786
[2] https://github.com/ceph/ceph/blob/e0a40880cc0542197ec240090064f70becef918d/src/rgw/driver/rados/rgw_rados.cc#L7486-L7507

Actions #2

Updated by Cory Snyder 11 months ago

  • Related to Bug #61359: Consistency bugs with OLH objects added
Actions #3

Updated by Cory Snyder 11 months ago

The linked issue adds a radosgw-admin command to clean up these leftover entries and objects.

https://tracker.ceph.com/issues/61359

Actions #4

Updated by Cory Snyder 11 months ago

  • Related to Bug #59164: LC rules cause latency spikes added
Actions #5

Updated by Cory Snyder 11 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 51700
Actions #6

Updated by Casey Bodley 9 months ago

  • Status changed from Fix Under Review to Resolved

fix merged in https://github.com/ceph/ceph/pull/51700 as part of https://tracker.ceph.com/issues/61359; we'll track the backports there

Actions

Also available in: Atom PDF