Bug #59663
closedrgw: expired delete markers created by deleting non-existant object multiple times are not being removed from data pool after deletion from bucket
0%
Description
As per https://github.com/ceph/ceph/pull/48400
Deleting a non-existent object generates a delete marker for that object, which is being removed by either ExpiredObjectDeleteMarker lifecycle or bucket deletion.
However, for some reason sending a delete request twice or more makes it so when you delete that marker from the bucket or delete bucket itself - marker object in data pool persists.
1)
s3cmd del s3://testbucket/1.file
s3cmd del s3://testbucket/1.file
2)
aws --endpoint-url http://192.168.1.25 s3api list-object-versions --bucket=testbucket
{
"DeleteMarkers": [
{
"Owner": {
"DisplayName": "test1",
"ID": "test1"
},
"Key": "1.file",
"VersionId": "null",
"IsLatest": true,
"LastModified": "2023-05-06T17:41:28.967Z"
3)
radosgw-admin bucket stats --bucket=testbucket
...
"id": "c4316f2e-bb76-4e37-b65a-6e050cee258e.544742.1"
4)
rados -p default.rgw.buckets.data ls | grep '544742.1'
c4316f2e-bb76-4e37-b65a-6e050cee258e.544742.1_1.file
5)
after 'radosgw-admin lc process' with ExpiredObjectDeleteMarker lifecycle set up (or bucket removal) - marker is removed from bucket
aws --endpoint-url http://192.168.1.25 s3api list-object-versions --bucket=testbucket
[]
6) marker persists in bucket index and data pool
radosgw-admin bi list --bucket='testbucket'
...
"idx": "1.file"
...
rados -p default.rgw.buckets.data ls | grep '544742.1'
c4316f2e-bb76-4e37-b65a-6e050cee258e.544742.1_1.file
7) no objects listed for garbage collection
radosgw-admin gc list --include-all
[]
Updated by Cory Snyder 12 months ago
- Assignee set to Cory Snyder
I've noticed this issue as well. What is happening is that the second delete op intentionally exits early from rgw_bucket_link_olh [1]. This error condition causes the pending xattr on the OLH object to not get cleaned up. When the LC processor deletes the actual delete marker instance, we fail to remove the plain index entry, the OLH index entry, and the OLH RADOS object due to this outstanding pending xattr [2]. Note that it isn't the actual delete marker that remains after this scenario plays out, but just extra book-keeping baggage.
[1] https://github.com/ceph/ceph/blob/e0a40880cc0542197ec240090064f70becef918d/src/cls/rgw/cls_rgw.cc#L1778-L1786
[2] https://github.com/ceph/ceph/blob/e0a40880cc0542197ec240090064f70becef918d/src/rgw/driver/rados/rgw_rados.cc#L7486-L7507
Updated by Cory Snyder 12 months ago
- Related to Bug #61359: Consistency bugs with OLH objects added
Updated by Cory Snyder 12 months ago
The linked issue adds a radosgw-admin command to clean up these leftover entries and objects.
Updated by Cory Snyder 12 months ago
- Related to Bug #59164: LC rules cause latency spikes added
Updated by Cory Snyder 12 months ago
- Status changed from New to Fix Under Review
- Pull request ID set to 51700
Updated by Casey Bodley 10 months ago
- Status changed from Fix Under Review to Resolved
fix merged in https://github.com/ceph/ceph/pull/51700 as part of https://tracker.ceph.com/issues/61359; we'll track the backports there