Project

General

Profile

Actions

Bug #63542

closed

Delete-Marker deletion inconsistencies

Added by Giuseppe Baccini 6 months ago. Updated about 2 months ago.

Status:
Won't Fix
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We stumbled into an issue trying to make the following s3test pass for the downstream rgw/sfs driver implementation:

 test_object_lock_delete_object_with_retention_and_marker

Here is the downstream issue: https://github.com/aquarist-labs/s3gw/issues/690
And here the associated PR: https://github.com/aquarist-labs/ceph/pull/241

To summarize the problem we are facing, here a brief descritption:

We guess that RGWDeleteObj::execute() relies on the return code from rgw::sal::Object::get_obj_state() to ascertain the object being deleted is a regular object rather than a delete-marker.
We guess that the rule is to return -ENOENT to signal a delete-marker to the caller.

Reasons to say this is mainly the following code:

rgw_op.cc::void RGWDeleteObj::execute(optional_yield y)
...
        op_ret = s->object->get_obj_state(this, &astate, s->yield, true);
...
        if (check_obj_lock) {
          /* check if obj exists, read orig attrs */
          if (op_ret == -ENOENT) {
            /* object maybe delete_marker, skip check_obj_lock*/
            check_obj_lock = false;
          } else {
            return;
          }
        }
...

Because before the patch we are trying, the get_obj_state() implementation for the rgw/sfs driver was always returning 0 also a delete-marker fell into a regular object in RGWDeleteObj::execute().
This is wrong because, when dealing with object lock checks, a delete-marker is always allowed to be deleted regardless of the object's retention mode.
Returning 0 was incorrectly preventing a delete-marker to be deleted for object-lock protected objects.

So, after we patched the code, we returned -ENOENT for delete-markers.
This fixed the test_object_lock_delete_object_with_retention_and_marker s3test, but at the same time broke also the following s3tests:

test_lifecycle_deletemarker_expiration
test_multi_object_delete
test_multi_objectv2_delete
test_versioning_concurrent_multi_object_delete
test_versioning_multi_object_delete
test_versioning_multi_object_delete_with_marker
test_versioning_multi_object_delete_with_marker_create

We guess that this is because in the rgw code there are places where:

  ret = obj->get_obj_state(dpp, &obj_state, null_yield, true);
  if (ret < 0) {
    return ret;
  }

So, basically, -ENOENT is interpreted as an error instead of a special case meaning "delete-marker".

Are there other considerations to do here?
Or this can be considered something must be fixed in the abstract part of the radosgw?

This issue: https://tracker.ceph.com/issues/55766 could be related to this.

Actions #1

Updated by Casey Bodley 6 months ago

  • Assignee set to Daniel Gryniewicz
Actions #2

Updated by Daniel Gryniewicz 5 months ago

RGWRados::get_obj_state() indeed returns -ENOENT if the object is a delete marker, and it's been that way for a long time. There are some callsites that handle this, but most don't check it explicitly. I'm not sure if these cases are because a delete marker is actually a failure, or if it's broken and needs to be fixed. This does seem to me to be a bad API, although I don't think you'd ever get to the point where you're calling get_obj_state() on a non-existent object, so ENOENT may not be overloaded, but it's certainly not clear.

It's probably best if this is discussed on one of the Wed. refactoring calls. Can you join one of these so we can all discuss as a group?

Actions #3

Updated by Giuseppe Baccini 5 months ago

Thanks Daniel to have looked into this; unfortunately I'm currently in the position to not be able to follow this issue.
I cannot say now, if I will have time in the future to follow this.
Up to you, we can leave this opened or we can close.

Actions #4

Updated by Casey Bodley about 2 months ago

  • Status changed from New to Won't Fix
Actions

Also available in: Atom PDF