Bug #39657
multisite: metadata sync does not keep retrying failed entries
100%
Description
RGWMetaSyncSingleEntryCR will retry sync of an entry NUM_TRANSIENT_ERROR_RETRIES=10 times and give up. After returning a failure, sync continues advancing past the entry and never retries again until radosgw restarts.
The rgw_sync_meta_inject_err_probability config variable injects errors here to test the error handling, but the lack of retries means that we can't pass multisite tests with error injection enabled.
Related issues
History
#1 Updated by Casey Bodley almost 4 years ago
- Status changed from New to In Progress
- Assignee set to Casey Bodley
#2 Updated by fang yuxiang over 3 years ago
Casey Bodley wrote:
RGWMetaSyncSingleEntryCR will retry sync of an entry NUM_TRANSIENT_ERROR_RETRIES=10 times and give up. After returning a failure, sync continues advancing past the entry and never retries again until radosgw restarts.
The rgw_sync_meta_inject_err_probability config variable injects errors here to test the error handling, but the lack of retries means that we can't pass multisite tests with error injection enabled.
how about the progress now?
#3 Updated by Casey Bodley over 3 years ago
fang yuxiang wrote:
how about the progress now?
no progress yet, only thinking about design
#4 Updated by fang yuxiang over 3 years ago
Casey Bodley wrote:
fang yuxiang wrote:
how about the progress now?
no progress yet, only thinking about design
looks like an awesome job.
could you share something about the design thoughts? thanks
#5 Updated by Casey Bodley over 1 year ago
- Status changed from In Progress to Fix Under Review
- Backport changed from luminous mimic nautilus to octopus pacific
- Pull request ID set to 42317
#6 Updated by Casey Bodley over 1 year ago
- Status changed from Fix Under Review to Pending Backport
#7 Updated by Backport Bot over 1 year ago
- Copied to Backport #51784: octopus: multisite: metadata sync does not keep retrying failed entries added
#8 Updated by Backport Bot over 1 year ago
- Copied to Backport #51785: pacific: multisite: metadata sync does not keep retrying failed entries added
#9 Updated by Casey Bodley about 1 year ago
- Related to Bug #53668: Why not add a xxx.retry obJ to metadata synchronization at multisite for exception retries added
#10 Updated by Christian Rohmann 11 months ago
There is a PR supposedly fixing this issue: https://github.com/ceph/ceph/pull/46148
#11 Updated by Backport Bot 8 months ago
- Tags changed from multisite to multisite backport_processed
#12 Updated by Konstantin Shalygin 14 days ago
- Status changed from Pending Backport to Resolved
- % Done changed from 0 to 100