Project

General

Profile

Actions

Bug #55770

open

multisite: metadata full sync does not copy mdlog entries, so can't serve them after failover

Added by Casey Bodley almost 2 years ago. Updated 6 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
multisite multisite-backlog
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

in metadata incremental sync, RGWCloneMetaLogCoroutine copies mdlog entries from the primary zone, so that it can serve those entries to other zones in the case of failover (and it's promoted to the 'metadata master')

these entries are not copied in metadata full sync. so a new zone that syncs all of its metadata from full sync won't have any of the mdlog entries. if that new zone is promoted to the metadata master, it won't be able to serve these entries to other zones that are behind in incremental sync. so those zones will think they're caught up, but may be missing lots of metadata

to fix this, metadata full sync (at the beginning of RGWMetaSyncShardCR::full_sync()?) will need to copy the source zone's mdlog - for example, by calling RGWCloneMetaLogCoroutine in a loop

Actions #1

Updated by Casey Bodley 12 months ago

  • Tags changed from multisite to multisite multisite-backlog
Actions #2

Updated by Jiffin Tony Thottan 6 months ago

Steps tried to reproduce the issue:

  • created master zone us-east and added secondary zone us-west. Each with one rgw server each
  • Created the user "a1" and "bkt" via s3. Upload the file into it.
  • Added Zone us-north into, sync status was successful. Was able to fetch bkt/file via a1 user credentials and us-north endpoint
  • Stopped cluster1 with us-east, and changed the master zone to us-north.
  • Deleted and removed us-east, modified the endpoint of the zonegroup us.
  • Add zone us-south with cluster 4. Was able to fetch bkt/file via a1 user credentials and us-south endpoint

Based above findings this issue may already fixed in the master

Actions #3

Updated by Casey Bodley 6 months ago

these entries are not copied in metadata full sync. so a new zone that syncs all of its metadata from full sync won't have any of the mdlog entries. if that new zone is promoted to the metadata master, it won't be able to serve these entries to other zones that are behind in incremental sync

thanks Jiffin. in this case, us-south is getting the user from metadata full sync. the bug isn't that it misses the user, but that it doesn't copy the mdlog entries so can't serve them to other zones for incremental sync

so your reproducer should either look at the mdlog list output (make sure you disable log trimming), or do failover while there's another zone that finished a metadata full sync then shut down before the user was created

Actions

Also available in: Atom PDF