Project

General

Profile

Actions

Support #19305

open

Master is on a different period

Added by Daniel Biazus about 7 years ago. Updated almost 7 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Tags:
multisite
Reviewed:
Affected Versions:
Pull request ID:

Description

After a disaster recovery process, making a Secondary zone the Master,
and the old Master as a Secondary zone. We could see the metadata stop
syncing between the clusters, and any new bucket or users is replicated to
Secondary Zone.

Version Running: 10.2.6

Running "radosgw-admin sync status" on Master cluster, We got:

realm 7792a922-cfc6-4eb0-a01d-e4ba327ee8ad (am)
zonegroup 8585c1cb-fae0-4d2f-bf16-48b6072c8587 (us)
zone 3dac5e2c-a17b-4d2e-a3e1-3ad6cb7cf72f (us-west-1)
metadata sync no sync (zone is master)
data sync source: 6ae07871-424f-4cf8-8aaa-e1b84e4babdf (us-east-1)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source

And in the Secondary:

realm 7792a922-cfc6-4eb0-a01d-e4ba327ee8ad (am)
zonegroup 8585c1cb-fae0-4d2f-bf16-48b6072c8587 (us)
zone 6ae07871-424f-4cf8-8aaa-e1b84e4babdf (us-east-1)
metadata sync syncing
full sync: 0/64 shards * master is on a different period:
master_period=942e0826-7026-4aad-95d1-6ddd58e7de30
local_period=2c3962e2-acc3-41e5-b625-34fcc6f8176a*
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 3dac5e2c-a17b-4d2e-a3e1-3ad6cb7cf72f (us-west-1)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source

Here, We could see the Secondary cluster reporting a different period on
Master, however is not true, because both clusters has the same period: {
"current_period": "942e0826-7026-4aad-95d1-6ddd58e7de30"
}

radosgw-admin period get-current

Apparently, it was a bug that was fixed in 10.2.6, but still happening

http://tracker.ceph.com/issues/18684

Steps to reproduce:

On secondary cluster:
radosgw-admin zone modify --rgw-zone={zone-name} --master --default
radosgw-admin period update --commit
systemctl restart ceph-radosgw@*

At this point the secondary cluster is working well as master.

After the failed (old master) cluster is back:

radosgw-admin period pull --url={url-to-master-zone-gateway} --access-key={access-key} --secret={secret}
radosgw-admin zone modify --rgw-zone={zone-name} --master --default
radosgw-admin period update --commit
systemctl restart ceph-radosgw@*

And now, We remove the flag master from the Secondary cluster:

radosgw-admin zone modify --rgw-zone={zone-name} --master=false
radosgw-admin period update --commit
systemctl restart ceph-radosgw@*

And then, running the command "radosgw-admin sync status" on secondary cluster, we have:
master is on a different period: master_period=0c31136b-f2bd-402a-a65d-ed03a1956683 local_period=abd599c5-36d4-492e-a2f5-2a10eb2b6a93

Thanks,
Daniel Biazus

Actions

Also available in: Atom PDF