Bug #63799: multisite: lc expiration action on versioned buckets generates delete-marker with different version ids on different zones - rgw - Ceph

Actions

Copy link

Bug #63799

open

multisite: lc expiration action on versioned buckets generates delete-marker with different version ids on different zones

Added by Jane Zhu 5 months ago. Updated 5 months ago.

Status:

Fix Under Review

Priority:

Normal

Assignee:

Jane Zhu

Target version:

% Done:

Source:

Tags:

multisite, lifecycle

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

54957

Crash signature (v1):

Crash signature (v2):

Description

In multisite settings, lifecycle on each zone would generate a delete marker with their own version id if the lc process happens before the delete-marker replication.

This can cause problems if either zone deletes their delete marker. When another zone tries to replicate that deletion, they'd fail to find that version so leave their own delete marker intact. At this point, the zones could respond differently to GET requests for the object name. And if the source zone goes on to delete their empty bucket, the other zones would end up orphaning the corresponding rados object.

Actions

Copy link

Updated by Matt Benjamin 5 months ago

I think this issue is at least partially related to one being worked on by Kalpesh and Shilpa. It makes sense to discuss in the refactoring meeting, I'll alert them.

Actions

Copy link

Updated by Jane Zhu 5 months ago

Some discussion from a PR https://github.com/ceph/ceph/pull/54759#discussion_r1424894258

smanjara
in this scenario, I'd have expected two delete marker versions on either zones, one from its own delete op, and the second syncing from the other zone. but I don't think we allow multiple delete markers for an object.

smanjara
the second delete marker creation will fail because we return an -ENOENT here in rgw.bucket_link_olh() if we already have a delete marker in
https://github.com/ceph/ceph/blob/main/src/cls/rgw/cls_rgw.cc#L1695-L1702

jzhu116-bloomberg
Yes, this is exactly what I observed from my testing. The replication failed to create the delete-marker with the following error
2023-12-13T01:57:17.297-0500 7f393fa49700  0 rgw async rados processor: ERROR: bucket shard callback failed. obj=file_4k[WFpr4YYECQ6z9qtSFkOdbS6v.1Cjg6.]. ret=(2) No such file or directory

Actions

Copy link

Updated by Shilpa MJ 5 months ago

Hi Jane,

Following up on our conversation, this block (https://github.com/ceph/ceph/blob/main/src/cls/rgw/cls_rgw.cc#L1695-L1702) to prevent rgw from creating multiple delete markers was introduced as a fix to an LC expiration issue as described in https://tracker.ceph.com/issues/51249.
But it was more of an LC bug than a delete marker one and changing the delete marker behaviour was unnecessary. there is more conversation about this in https://github.com/ceph/ceph/pull/45754.

For this pr about multisite in particular, if we allow multiple delete markers to exist,then we could let the zones take care of syncing their creation and deletion without needing to add any special handling because the same versions would be maintained on both zones. So, I'm proposing that we revert the changes made in https://github.com/ceph/ceph/pull/41897, and test scenarios involving multiple delete markers and see how they behave with LC and multisite in picture.

Would you be interested in helping with testing this?

Actions

Copy link

Updated by Jane Zhu 5 months ago

Shilpa MJ wrote:

Hi Jane,

Following up on our conversation, this block (https://github.com/ceph/ceph/blob/main/src/cls/rgw/cls_rgw.cc#L1695-L1702) to prevent rgw from creating multiple delete markers was introduced as a fix to an LC expiration issue as described in https://tracker.ceph.com/issues/51249.
But it was more of an LC bug than a delete marker one and changing the delete marker behaviour was unnecessary. there is more conversation about this in https://github.com/ceph/ceph/pull/45754.

For this pr about multisite in particular, if we allow multiple delete markers to exist,then we could let the zones take care of syncing their creation and deletion without needing to add any special handling because the same versions would be maintained on both zones. So, I'm proposing that we revert the changes made in https://github.com/ceph/ceph/pull/41897, and test scenarios involving multiple delete markers and see how they behave with LC and multisite in picture.

Would you be interested in helping with testing this?

Sure thing. I can test this.
Just to clarify. The plan is if this works well with both lc and multisite, we will go this route instead of > overwriting the delete marker, during multisite replication, with the later timestamp one as we talked about in the > refactoring meeting right?

Thanks! Yes, that's right. If this works, then we don't need any of the changes we talked about in the refactoring meeting.

Actions

Copy link