Support #19305: Master is on a different period - rgw - Ceph

Actions

Copy link

Support #19305

open

Master is on a different period

Added by Daniel Biazus about 7 years ago. Updated almost 7 years ago.

Status:

New

Priority:

Normal

Assignee:

Target version:

% Done:

Tags:

multisite

Reviewed:

Affected Versions:

Ceph - v10.2.6

Pull request ID:

Description

After a disaster recovery process, making a Secondary zone the Master,
and the old Master as a Secondary zone. We could see the metadata stop
syncing between the clusters, and any new bucket or users is replicated to
Secondary Zone.

Version Running: 10.2.6

Running "radosgw-admin sync status" on Master cluster, We got:

realm 7792a922-cfc6-4eb0-a01d-e4ba327ee8ad (am)
      zonegroup 8585c1cb-fae0-4d2f-bf16-48b6072c8587 (us)
           zone 3dac5e2c-a17b-4d2e-a3e1-3ad6cb7cf72f (us-west-1)
  metadata sync no sync (zone is master)
      data sync source: 6ae07871-424f-4cf8-8aaa-e1b84e4babdf (us-east-1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

And in the Secondary:

realm 7792a922-cfc6-4eb0-a01d-e4ba327ee8ad (am)
      zonegroup 8585c1cb-fae0-4d2f-bf16-48b6072c8587 (us)
           zone 6ae07871-424f-4cf8-8aaa-e1b84e4babdf (us-east-1)
  metadata sync syncing
                full sync: 0/64 shards
       *   master is on a different period:
master_period=942e0826-7026-4aad-95d1-6ddd58e7de30
local_period=2c3962e2-acc3-41e5-b625-34fcc6f8176a*
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 3dac5e2c-a17b-4d2e-a3e1-3ad6cb7cf72f (us-west-1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

Here, We could see the Secondary cluster reporting a different period on
Master, however is not true, because both clusters has the same period: {
"current_period": "942e0826-7026-4aad-95d1-6ddd58e7de30"
}

radosgw-admin period get-current

Apparently, it was a bug that was fixed in 10.2.6, but still happening

http://tracker.ceph.com/issues/18684

Steps to reproduce:

On secondary cluster:
radosgw-admin zone modify --rgw-zone={zone-name} --master --default
radosgw-admin period update --commit
systemctl restart ceph-radosgw@*

At this point the secondary cluster is working well as master.

After the failed (old master) cluster is back:

radosgw-admin period pull --url={url-to-master-zone-gateway} --access-key={access-key} --secret={secret}
radosgw-admin zone modify --rgw-zone={zone-name} --master --default
radosgw-admin period update --commit
systemctl restart ceph-radosgw@*

And now, We remove the flag master from the Secondary cluster:

radosgw-admin zone modify --rgw-zone={zone-name} --master=false
radosgw-admin period update --commit
systemctl restart ceph-radosgw@*

And then, running the command "radosgw-admin sync status" on secondary cluster, we have:
master is on a different period: master_period=0c31136b-f2bd-402a-a65d-ed03a1956683 local_period=abd599c5-36d4-492e-a2f5-2a10eb2b6a93

Thanks,
Daniel Biazus

Actions

Copy link

Updated by Nathan Cutler about 7 years ago

Tracker changed from Tasks to Support
Project changed from Stable releases to rgw

Actions

Copy link

Updated by FAN YANG almost 7 years ago

I test it in 10.2.7.And it also happened .

root@slave:~# radosgw-admin sync status
realm f155e2b3-51e8-4767-8f3a-13512d07295b (earth)
zonegroup f0d970b5-bf7a-4216-8d5a-ca8068d2711d (cn)
zone 2d5d6b89-0579-4519-9d87-45a28c541ffb (cn-bj-east)
metadata sync syncing
full sync: 0/64 shards
master is on a different period: master_period=5dc15b04-9a71-4bfc-af55-81c315881726 local_period=8f695a7a-69b1-4375-84fe-071719eee695
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: bc780558-db84-47ad-9314-677ea9dda7c6 (cn-sy-south)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
root@slave:~# ceph -v
ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)

Actions

Copy link

Updated by Daniel Biazus almost 7 years ago

Yes, I was able to reproduce on 10.2.7 also.

I was wondering how the failover/failback process it's been done with this situation still happening in multisites setup. Is there any other workaround to do in order to avoid this issue ?

ceph -v
ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)

Actions

Copy link

Updated by FAN YANG almost 7 years ago

Maybe it's caused by a bug with period.( http://tracker.ceph.com/issues/18639 )

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Support #19305

Master is on a different period

Updated by Nathan Cutler about 7 years ago

Updated by FAN YANG almost 7 years ago

Updated by Daniel Biazus almost 7 years ago

Updated by FAN YANG almost 7 years ago