Project

General

Profile

Bug #39657

multisite: metadata sync does not keep retrying failed entries

Added by Casey Bodley almost 4 years ago. Updated 14 days ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

100%

Source:
Tags:
multisite backport_processed
Backport:
octopus pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

RGWMetaSyncSingleEntryCR will retry sync of an entry NUM_TRANSIENT_ERROR_RETRIES=10 times and give up. After returning a failure, sync continues advancing past the entry and never retries again until radosgw restarts.

The rgw_sync_meta_inject_err_probability config variable injects errors here to test the error handling, but the lack of retries means that we can't pass multisite tests with error injection enabled.


Related issues

Related to rgw - Bug #53668: Why not add a xxx.retry obJ to metadata synchronization at multisite for exception retries Need More Info
Copied to rgw - Backport #51784: octopus: multisite: metadata sync does not keep retrying failed entries Rejected
Copied to rgw - Backport #51785: pacific: multisite: metadata sync does not keep retrying failed entries Resolved

History

#1 Updated by Casey Bodley almost 4 years ago

  • Status changed from New to In Progress
  • Assignee set to Casey Bodley

#2 Updated by fang yuxiang over 3 years ago

Casey Bodley wrote:

RGWMetaSyncSingleEntryCR will retry sync of an entry NUM_TRANSIENT_ERROR_RETRIES=10 times and give up. After returning a failure, sync continues advancing past the entry and never retries again until radosgw restarts.

The rgw_sync_meta_inject_err_probability config variable injects errors here to test the error handling, but the lack of retries means that we can't pass multisite tests with error injection enabled.

how about the progress now?

#3 Updated by Casey Bodley over 3 years ago

fang yuxiang wrote:

how about the progress now?

no progress yet, only thinking about design

#4 Updated by fang yuxiang over 3 years ago

Casey Bodley wrote:

fang yuxiang wrote:

how about the progress now?

no progress yet, only thinking about design

looks like an awesome job.

could you share something about the design thoughts? thanks

#5 Updated by Casey Bodley over 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Backport changed from luminous mimic nautilus to octopus pacific
  • Pull request ID set to 42317

#6 Updated by Casey Bodley over 1 year ago

  • Status changed from Fix Under Review to Pending Backport

#7 Updated by Backport Bot over 1 year ago

  • Copied to Backport #51784: octopus: multisite: metadata sync does not keep retrying failed entries added

#8 Updated by Backport Bot over 1 year ago

  • Copied to Backport #51785: pacific: multisite: metadata sync does not keep retrying failed entries added

#9 Updated by Casey Bodley about 1 year ago

  • Related to Bug #53668: Why not add a xxx.retry obJ to metadata synchronization at multisite for exception retries added

#10 Updated by Christian Rohmann 11 months ago

There is a PR supposedly fixing this issue: https://github.com/ceph/ceph/pull/46148

#11 Updated by Backport Bot 8 months ago

  • Tags changed from multisite to multisite backport_processed

#12 Updated by Konstantin Shalygin 14 days ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100

Also available in: Atom PDF