Bug #53668
open
Why not add a xxx.retry obJ to metadata synchronization at multisite for exception retries
Added by Jinghua Zeng over 2 years ago.
Updated about 2 years ago.
Description
I see data synchronization being retry,the previously failed instance Error repo will be attempted before synchronization
- Project changed from Ceph to rgw
- Status changed from New to Need More Info
- Tags set to multisite
in general, object uploads tend to be way more frequent than metadata changes like bucket/user creation. the datalog sees a LOT more traffic than the mdlog, so is more sensitive to errors/retries
the datalog's error repo allows us to move failing entries out of the datalog, so data sync can continue to advance and process new entries. this adds some extra complexity to data sync, because it has to schedule sync from two different sources
metadata sync could have an error repo, but we rarely see issues with metadata sync catching up from a backlog, so i don't think it's worth the complexity. multisite is already complicated enough that we have trouble maintaining it. and we did have issues with error handling that were resolved as part of https://tracker.ceph.com/issues/39657
i think it would be ideal for rgw to use the same code paths for data- and metadata sync, but we're a long way from being able to do that
@Jinghua Zeng are you interested in working on stuff like this?
- Related to Bug #39657: multisite: metadata sync does not keep retrying failed entries added
Casey Bodley wrote:
in general, object uploads tend to be way more frequent than metadata changes like bucket/user creation. the datalog sees a LOT more traffic than the mdlog, so is more sensitive to errors/retries
the datalog's error repo allows us to move failing entries out of the datalog, so data sync can continue to advance and process new entries. this adds some extra complexity to data sync, because it has to schedule sync from two different sources
metadata sync could have an error repo, but we rarely see issues with metadata sync catching up from a backlog, so i don't think it's worth the complexity. multisite is already complicated enough that we have trouble maintaining it. and we did have issues with error handling that were resolved as part of https://tracker.ceph.com/issues/39657
i think it would be ideal for rgw to use the same code paths for data- and metadata sync, but we're a long way from being able to do that
@Jinghua Zeng are you interested in working on stuff like this?
I'm very interested in that. I think data synchronization depends on metadata. Sometimes, error repo may have many entries to synchronize due to a bucket instance synchronization failure. It also causes the metadata pool to be too large because BILog is not being used
Also available in: Atom
PDF