Bug #39657
closedmultisite: metadata sync does not keep retrying failed entries
100%
Description
RGWMetaSyncSingleEntryCR will retry sync of an entry NUM_TRANSIENT_ERROR_RETRIES=10 times and give up. After returning a failure, sync continues advancing past the entry and never retries again until radosgw restarts.
The rgw_sync_meta_inject_err_probability config variable injects errors here to test the error handling, but the lack of retries means that we can't pass multisite tests with error injection enabled.
Updated by Casey Bodley almost 5 years ago
- Status changed from New to In Progress
- Assignee set to Casey Bodley
Updated by fang yuxiang over 4 years ago
Casey Bodley wrote:
RGWMetaSyncSingleEntryCR will retry sync of an entry NUM_TRANSIENT_ERROR_RETRIES=10 times and give up. After returning a failure, sync continues advancing past the entry and never retries again until radosgw restarts.
The rgw_sync_meta_inject_err_probability config variable injects errors here to test the error handling, but the lack of retries means that we can't pass multisite tests with error injection enabled.
how about the progress now?
Updated by Casey Bodley over 4 years ago
fang yuxiang wrote:
how about the progress now?
no progress yet, only thinking about design
Updated by fang yuxiang over 4 years ago
Casey Bodley wrote:
fang yuxiang wrote:
how about the progress now?
no progress yet, only thinking about design
looks like an awesome job.
could you share something about the design thoughts? thanks
Updated by Casey Bodley almost 3 years ago
- Status changed from In Progress to Fix Under Review
- Backport changed from luminous mimic nautilus to octopus pacific
- Pull request ID set to 42317
Updated by Casey Bodley over 2 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot over 2 years ago
- Copied to Backport #51784: octopus: multisite: metadata sync does not keep retrying failed entries added
Updated by Backport Bot over 2 years ago
- Copied to Backport #51785: pacific: multisite: metadata sync does not keep retrying failed entries added
Updated by Casey Bodley about 2 years ago
- Related to Bug #53668: Why not add a xxx.retry obJ to metadata synchronization at multisite for exception retries added
Updated by Christian Rohmann almost 2 years ago
There is a PR supposedly fixing this issue: https://github.com/ceph/ceph/pull/46148
Updated by Backport Bot over 1 year ago
- Tags changed from multisite to multisite backport_processed
Updated by Konstantin Shalygin about 1 year ago
- Status changed from Pending Backport to Resolved
- % Done changed from 0 to 100