Project

General

Profile

Actions

Bug #39657

closed

multisite: metadata sync does not keep retrying failed entries

Added by Casey Bodley almost 5 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

100%

Source:
Tags:
multisite backport_processed
Backport:
octopus pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

RGWMetaSyncSingleEntryCR will retry sync of an entry NUM_TRANSIENT_ERROR_RETRIES=10 times and give up. After returning a failure, sync continues advancing past the entry and never retries again until radosgw restarts.

The rgw_sync_meta_inject_err_probability config variable injects errors here to test the error handling, but the lack of retries means that we can't pass multisite tests with error injection enabled.


Related issues 3 (1 open2 closed)

Related to rgw - Bug #53668: Why not add a xxx.retry obJ to metadata synchronization at multisite for exception retriesNeed More Info

Actions
Copied to rgw - Backport #51784: octopus: multisite: metadata sync does not keep retrying failed entriesRejectedActions
Copied to rgw - Backport #51785: pacific: multisite: metadata sync does not keep retrying failed entriesResolvedCory SnyderActions
Actions #1

Updated by Casey Bodley almost 5 years ago

  • Status changed from New to In Progress
  • Assignee set to Casey Bodley
Actions #2

Updated by fang yuxiang over 4 years ago

Casey Bodley wrote:

RGWMetaSyncSingleEntryCR will retry sync of an entry NUM_TRANSIENT_ERROR_RETRIES=10 times and give up. After returning a failure, sync continues advancing past the entry and never retries again until radosgw restarts.

The rgw_sync_meta_inject_err_probability config variable injects errors here to test the error handling, but the lack of retries means that we can't pass multisite tests with error injection enabled.

how about the progress now?

Actions #3

Updated by Casey Bodley over 4 years ago

fang yuxiang wrote:

how about the progress now?

no progress yet, only thinking about design

Actions #4

Updated by fang yuxiang over 4 years ago

Casey Bodley wrote:

fang yuxiang wrote:

how about the progress now?

no progress yet, only thinking about design

looks like an awesome job.

could you share something about the design thoughts? thanks

Actions #5

Updated by Casey Bodley almost 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Backport changed from luminous mimic nautilus to octopus pacific
  • Pull request ID set to 42317
Actions #6

Updated by Casey Bodley over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Backport Bot over 2 years ago

  • Copied to Backport #51784: octopus: multisite: metadata sync does not keep retrying failed entries added
Actions #8

Updated by Backport Bot over 2 years ago

  • Copied to Backport #51785: pacific: multisite: metadata sync does not keep retrying failed entries added
Actions #9

Updated by Casey Bodley about 2 years ago

  • Related to Bug #53668: Why not add a xxx.retry obJ to metadata synchronization at multisite for exception retries added
Actions #10

Updated by Christian Rohmann almost 2 years ago

There is a PR supposedly fixing this issue: https://github.com/ceph/ceph/pull/46148

Actions #11

Updated by Backport Bot over 1 year ago

  • Tags changed from multisite to multisite backport_processed
Actions #12

Updated by Konstantin Shalygin about 1 year ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF