Project

General

Profile

Actions

Bug #63542

closed

Delete-Marker deletion inconsistencies

Added by Giuseppe Baccini 6 months ago. Updated 2 months ago.

Status:
Won't Fix
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We stumbled into an issue trying to make the following s3test pass for the downstream rgw/sfs driver implementation:

 test_object_lock_delete_object_with_retention_and_marker

Here is the downstream issue: https://github.com/aquarist-labs/s3gw/issues/690
And here the associated PR: https://github.com/aquarist-labs/ceph/pull/241

To summarize the problem we are facing, here a brief descritption:

We guess that RGWDeleteObj::execute() relies on the return code from rgw::sal::Object::get_obj_state() to ascertain the object being deleted is a regular object rather than a delete-marker.
We guess that the rule is to return -ENOENT to signal a delete-marker to the caller.

Reasons to say this is mainly the following code:

rgw_op.cc::void RGWDeleteObj::execute(optional_yield y)
...
        op_ret = s->object->get_obj_state(this, &astate, s->yield, true);
...
        if (check_obj_lock) {
          /* check if obj exists, read orig attrs */
          if (op_ret == -ENOENT) {
            /* object maybe delete_marker, skip check_obj_lock*/
            check_obj_lock = false;
          } else {
            return;
          }
        }
...

Because before the patch we are trying, the get_obj_state() implementation for the rgw/sfs driver was always returning 0 also a delete-marker fell into a regular object in RGWDeleteObj::execute().
This is wrong because, when dealing with object lock checks, a delete-marker is always allowed to be deleted regardless of the object's retention mode.
Returning 0 was incorrectly preventing a delete-marker to be deleted for object-lock protected objects.

So, after we patched the code, we returned -ENOENT for delete-markers.
This fixed the test_object_lock_delete_object_with_retention_and_marker s3test, but at the same time broke also the following s3tests:

test_lifecycle_deletemarker_expiration
test_multi_object_delete
test_multi_objectv2_delete
test_versioning_concurrent_multi_object_delete
test_versioning_multi_object_delete
test_versioning_multi_object_delete_with_marker
test_versioning_multi_object_delete_with_marker_create

We guess that this is because in the rgw code there are places where:

  ret = obj->get_obj_state(dpp, &obj_state, null_yield, true);
  if (ret < 0) {
    return ret;
  }

So, basically, -ENOENT is interpreted as an error instead of a special case meaning "delete-marker".

Are there other considerations to do here?
Or this can be considered something must be fixed in the abstract part of the radosgw?

This issue: https://tracker.ceph.com/issues/55766 could be related to this.

Actions

Also available in: Atom PDF