Bug #58939: multisite: Versioned objects left after a deletion on secondary site - rgw - Ceph

Actions

Copy link

Bug #58939

open

multisite: Versioned objects left after a deletion on secondary site

Added by Tom Coldrick about 1 year ago. Updated about 1 year ago.

Status:

New

Priority:

Normal

Assignee:

Shilpa MJ

Target version:

% Done:

Source:

Tags:

multisite

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

While testing a versioned warp benchmark on a Reef multisite cluster, we noticed that the master side was missing object versions that were present on the secondary. The sync had apparently completed as reported by radosgw-admin sync status and radosgw-admin bucket sync status. Attempting to reproduce the issue is unreliable, but it usually crops up after a few attempts of

warp versioned --host="${endpoint}" \ --objects 25000 \ --access-key="${access_key}" \ --secret-key="${secret_key}" \ --noclear --concurrent 10 \ --duration 600s --obj.size=1M \ --bucket="${bucket_name}" \ --put-distrib 50 \ --get-distrib 0 \ --stat-distrib 0 \ --delete-distrib 50

Note - the behaviour in this case is that the secondary side has object versions that are not present on the primary side, it can also happen that the primary side has objects not present on the secondary as reported in #58911.

I've been looking at the logs produced by this issue, but I haven't tracked it down yet. From what I have gathered so far, the mechanics of the issue are as follows:

1. Object is created on primary side
2. Object is replicated primary -> secondary during full sync
3. Object is deleted on primary side during full sync
4. Secondary switches to incremental sync, but misses the DELETE from the bilog

At the moment I'm not sure on the mechanism by which the DELETE is missed during replication. The logs assert that incremental sync is starting from the markers assigned during InitBucketFullSyncStatusCR, which to my understanding should include the entry which doesn't seem to happen. I have noticed that some bilog shards don't seem to be syncing incrementally, but I haven't yet followed this track in depth.

Actions

Copy link

Updated by Tom Coldrick about 1 year ago

I realise that the title may be misleading! To clarify, the issue is that deleted object versions that were removed on the primary side are still present on the secondary side, not that replication of deletes directly sent to the secondary side aren't working.

Actions

Copy link

Updated by Tom Coldrick about 1 year ago

Tom Coldrick wrote:

At the moment I'm not sure on the mechanism by which the DELETE is missed during replication.

I've now done a bunch more debugging, and I think I have a good grasp on what's going on now! First some preliminaries: in the setup used we have two zones (let's call them P and S for primary and secondary) which are using active-active sync. All client traffic is going to the P zone. The issue occurs when the initial full sync from P to S for the bucket fails and needs to be rerun, while traffic is still hitting P. What happens for one of the extra objects is this:

1. Object instance is placed in P by client traffic, entered into bilog with marker M
2. Full sync begins P->S, markers are set for incremental sync
3. Object instance is replicated P->S
4. Object instance is deleted in P, entered into bilog with marker M+t1
5. Some error happens in replication process on S, halting progress
6. Full sync restarts P->S, markers are set again for incremental sync to some M+t2 > M+t1
7. Full sync completes P->S, incremental sync starts from marker M+t2

This means that the delete is completely missed from the bucket sync, and we get permanently out of sync with the P zone.

The title of this issue specifies that the issue is on versioned buckets, but I don't think it would be restricted to them! I can only reproduce this on versioned buckets, but this is because they seem to suffer from EIO errors when attempting to link an OLH sometimes. I've not yet hunted down why this happens, but it was these @EIO@s that were causing the full sync to fail and subsequently re-update the initial positions for the incremental markers.

I gathered a little more evidence for this by modifying InitBucketFullSyncStatusCR to only update the incremental sync shards if the state is in Init, and only to reset the state to Init if the current state is incremental, preventing markers being updated on the retry. This means that the markers used are those from the initial full sync run, and the bilog entries don't get missed. The results are that I can no longer reproduce this issue, but I now hit #58911 on every run. I don't think this is the correct fix in any case -- I just wanted some evidence for the hypothesis.

I have noticed that some bilog shards don't seem to be syncing incrementally, but I haven't yet followed this track in depth.

This turned out to be inconsequential -- some bucket shards simply did not require incremental sync after full sync completed, and so were not syncing incrementally.

Actions

Copy link

Updated by Casey Bodley about 1 year ago

Assignee set to Shilpa MJ
Tags set to multisite

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #58939

multisite: Versioned objects left after a deletion on secondary site

Updated by Tom Coldrick about 1 year ago

Updated by Tom Coldrick about 1 year ago

Updated by Casey Bodley about 1 year ago