Project

General

Profile

Actions

Bug #36077

open

rgw: master is on a different period

Added by arnaud lawson over 5 years ago. Updated over 5 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,

I am trying to configure object store replication between two clusters "us-east-1" and "us-east-2" and i am getting this error

"master is on a different period: master_period=35db7b3e-0554-46c3-9c12-bd1233998b66 local_period=318f6d70-6a9e-474d-8073-41a9ee9d645e".

Currently when i create objects in cluster us-east-1 they are replicated to us-east-2 but not in the other direction, meaning from us-east-2 cluster to us-east-1, and that should also be happening when replication is working.

Here are some details on replication:

"""""
[root@ceph-rgw004 ~]# radosgw-admin sync status -c /etc/ceph/ceph_ewr_prod.conf
realm 8f7fd3fd-f72d-411d-b06b-7b4b579f5f2f (prod)
zonegroup 60a2cb75-6978-46a3-b830-061c8be9dc75 (prod)
zone ffce148e-3b24-462d-98bf-8c212de31de5 (us-east-1)
metadata sync syncing
full sync: 0/64 shards
master is on a different period: master_period=35db7b3e-0554-46c3-9c12-bd1233998b66 local_period=318f6d70-6a9e-474d-8073-41a9ee9d645e
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 7fe96e52-d6f7-4ad6-b66e-ecbbbffbc18e (us-east-2)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
[root@ceph-rgw004 ~]# radosgw-admin zonegroup get -c /etc/ceph/ceph_ewr_prod.conf {
"id": "60a2cb75-6978-46a3-b830-061c8be9dc75",
"name": "prod",
"api_name": "",
"is_master": "true",
"endpoints": [
"http://10.122.64.50"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "7fe96e52-d6f7-4ad6-b66e-ecbbbffbc18e",
"zones": [ {
"id": "7fe96e52-d6f7-4ad6-b66e-ecbbbffbc18e",
"name": "us-east-2",
"endpoints": [
"http://10.122.64.50"
],
"log_meta": "true",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": []
}, {
"id": "ffce148e-3b24-462d-98bf-8c212de31de5",
"name": "us-east-1",
"endpoints": [
"http://10.194.0.102"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": []
}
],
"placement_targets": [ {
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "8f7fd3fd-f72d-411d-b06b-7b4b579f5f2f"
}
"""

ANY IDEAS HOW I COULD FIX THIS? THANKS.


Files

rgw-sync-debug-output.txt (277 KB) rgw-sync-debug-output.txt arnaud lawson, 09/19/2018 02:45 PM
Actions #1

Updated by arnaud lawson over 5 years ago

Attached is the output of this command
radosgw-admin sync status -c /etc/ceph/ceph_ewr_prod.conf --debug-rgw=10

Actions #2

Updated by arnaud lawson over 5 years ago

Also, when i restart the rgw hosts in the secondary/slave zone i get this in their logs

""""""""
2018-09-20 18:44:13.903171 7fd9865c7f80 0 starting handler: civetweb
2018-09-20 18:44:13.915701 7fd9865c7f80 1 mgrc service_daemon_register rgw.ceph-rgw002 metadata {arch=x86_64,ceph_version=ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable),cpu=Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz,distro=centos,distro_description=CentOS Linux 7 (Core),distro_version=7,frontend_config#0=civetweb port=10.194.2.126:80+10.194.2.126:443s ssl_certificate=/etc/ceph/private/radowgw_cert.pem num_threads=200,frontend_type#0=civetweb,hostname=ceph-rgw002.drt.ewr.prod.squarespace.net,kernel_description=#1 SMP Sat Jun 16 11:18:11 EDT 2018,kernel_version=4.17.2-1.el7.elrepo.x86_64,mem_swap_kb=6160380,mem_total_kb=8158036,num_handles=1,os=Linux,pid=50765,zone_id=ffce148e-3b24-462d-98bf-8c212de31de5,zone_name=us-east-1,zonegroup_id=60a2cb75-6978-46a3-b830-061c8be9dc75,zonegroup_name=prod}
2018-09-20 18:44:13.970083 7fd95b156700 -1 meta sync: ERROR: sync status period=318f6d70-6a9e-474d-8073-41a9ee9d645e does not match period=35db7b3e-0554-46c3-9c12-bd1233998b66 in history at realm epoch=2
""""""""""""""

Actions #3

Updated by arnaud lawson over 5 years ago

any idea how to fix this error?
""" """"""
ERROR: sync status period=318f6d70-6a9e-474d-8073-41a9ee9d645e does not match period=35db7b3e-0554-46c3-9c12-bd1233998b66 in history at realm epoch=2
""""""""""""""

essentially is their a way to update the sync status period to match 35db7b3e-0554-46c3-9c12-bd1233998b66 ? thanks.

Actions #4

Updated by arnaud lawson over 5 years ago

I was able to fix this issue by using the "radosgw-admin metadata sync init" command
and the "metadata sync" sub-commands are not shown in the "radosgw-admin -h" help.
i ended up finding them when reading the implementation of the radosgw-admin command here
https://github.com/ceph/ceph/blob/bd6d3f61e1e62cc591a58dfe3ed94d3b43a4e00d/src/rgw/rgw_admin.cc#L152-L154

So in the end what i did was re-initialise the sync and restarted the rgw hosts in the secondary zone

  1. radosgw-admin -c <config-file> metadata sync init
  1. and restart rgw hosts
Actions #5

Updated by Patrick Donnelly over 5 years ago

  • Project changed from Ceph to rgw
  • Subject changed from master is on a different period to rgw: master is on a different period
Actions

Also available in: Atom PDF