Support #19305: Master is on a different period - rgw - Ceph

Actions

Copy link

Support #19305

open

Master is on a different period

Added by Daniel Biazus about 7 years ago. Updated almost 7 years ago.

Status:

New

Priority:

Normal

Assignee:

Target version:

% Done:

Tags:

multisite

Reviewed:

Affected Versions:

Ceph - v10.2.6

Pull request ID:

Description

After a disaster recovery process, making a Secondary zone the Master,
and the old Master as a Secondary zone. We could see the metadata stop
syncing between the clusters, and any new bucket or users is replicated to
Secondary Zone.

Version Running: 10.2.6

Running "radosgw-admin sync status" on Master cluster, We got:

realm 7792a922-cfc6-4eb0-a01d-e4ba327ee8ad (am)
      zonegroup 8585c1cb-fae0-4d2f-bf16-48b6072c8587 (us)
           zone 3dac5e2c-a17b-4d2e-a3e1-3ad6cb7cf72f (us-west-1)
  metadata sync no sync (zone is master)
      data sync source: 6ae07871-424f-4cf8-8aaa-e1b84e4babdf (us-east-1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

And in the Secondary:

realm 7792a922-cfc6-4eb0-a01d-e4ba327ee8ad (am)
      zonegroup 8585c1cb-fae0-4d2f-bf16-48b6072c8587 (us)
           zone 6ae07871-424f-4cf8-8aaa-e1b84e4babdf (us-east-1)
  metadata sync syncing
                full sync: 0/64 shards
       *   master is on a different period:
master_period=942e0826-7026-4aad-95d1-6ddd58e7de30
local_period=2c3962e2-acc3-41e5-b625-34fcc6f8176a*
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 3dac5e2c-a17b-4d2e-a3e1-3ad6cb7cf72f (us-west-1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

Here, We could see the Secondary cluster reporting a different period on
Master, however is not true, because both clusters has the same period: {
"current_period": "942e0826-7026-4aad-95d1-6ddd58e7de30"
}

radosgw-admin period get-current

Apparently, it was a bug that was fixed in 10.2.6, but still happening

http://tracker.ceph.com/issues/18684

Steps to reproduce:

On secondary cluster:
radosgw-admin zone modify --rgw-zone={zone-name} --master --default
radosgw-admin period update --commit
systemctl restart ceph-radosgw@*

At this point the secondary cluster is working well as master.

After the failed (old master) cluster is back:

radosgw-admin period pull --url={url-to-master-zone-gateway} --access-key={access-key} --secret={secret}
radosgw-admin zone modify --rgw-zone={zone-name} --master --default
radosgw-admin period update --commit
systemctl restart ceph-radosgw@*

And now, We remove the flag master from the Secondary cluster:

radosgw-admin zone modify --rgw-zone={zone-name} --master=false
radosgw-admin period update --commit
systemctl restart ceph-radosgw@*

And then, running the command "radosgw-admin sync status" on secondary cluster, we have:
master is on a different period: master_period=0c31136b-f2bd-402a-a65d-ed03a1956683 local_period=abd599c5-36d4-492e-a2f5-2a10eb2b6a93

Thanks,
Daniel Biazus

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Support #19305

Master is on a different period

Updated by Nathan Cutler about 7 years ago

Updated by FAN YANG almost 7 years ago

Updated by Daniel Biazus almost 7 years ago

Updated by FAN YANG almost 7 years ago