Support #22822
openRGW multi site issue - data sync: ERROR: failed to fetch datalog info
0%
Description
I have noticed a strange issue. My synchronization worked quite ok for a some while.. But today in on one site I have something like this:
Slave - lets call them (pl-2)
# radosgw-admin sync status realm c6055c2e-5ac0-4638-851f-f1051b61d0c2 (ofp) zonegroup 4134640c-d16b-4166-bbd6-987637da469d (pl) zone 6328c6d7-31a5-4d42-8359-1e28689572da (pl-2) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: 8adfe5fc-65df-4227-9d85-1d0d1e66ac1f (pl-2) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 7 shards oldest incremental change not applied: 2018-01-28 13:46:05.0.415988s
So seems to be ok. However on the Master Zone I have:
radosgw-admin sync status realm c6055c2e-5ac0-4638-851f-f1051b61d0c2 (ofp) zonegroup 4134640c-d16b-4166-bbd6-987637da469d (pl) zone 8adfe5fc-65df-4227-9d85-1d0d1e66ac1f (pl-1) metadata sync no sync (zone is master) 2018-01-28 20:10:45.957218 7fd93b76bc40 0 data sync: ERROR: failed to fetch datalog info data sync source: 6328c6d7-31a5-4d42-8359-1e28689572da (pl-2) failed to retrieve sync info: (5) Input/output error
Sometimes the output from the master looks like that:
realm c6055c2e-5ac0-4638-851f-f1051b61d0c2 (ofp) zonegroup 4134640c-d16b-4166-bbd6-987637da469d (pl) zone 8adfe5fc-65df-4227-9d85-1d0d1e66ac1f (pl-1) metadata sync no sync (zone is master) data sync source: 6328c6d7-31a5-4d42-8359-1e28689572da (pl-2) syncing full sync: 0/128 shards incremental sync: 128/128 shards 2018-01-28 21:12:36.506006 7f59ead51c40 0 data sync: ERROR: failed to fetch datalog info
Restarting of RGW doesnt provide any value... It is:
ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)
below you can also find an information about my zone:
radosgw-admin zonegroup get
{
"id": "4134640c-d16b-4166-bbd6-987637da469d",
"name": "pl",
"api_name": "pl",
"is_master": "true",
"endpoints": [
"https:// URL TO F5 Load Balancer:443"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "8adfe5fc-65df-4227-9d85-1d0d1e66ac1f",
"zones": [
{
"id": "6328c6d7-31a5-4d42-8359-1e28689572da",
"name": "pl-2",
"endpoints": [
"https://URL TO F5 Load Balancer"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": []
},
{
"id": "8adfe5fc-65df-4227-9d85-1d0d1e66ac1f",
"name": "pl-1",
"endpoints": [
"https://URL TO LOAD BALANCER:443"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": []
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "c6055c2e-5ac0-4638-851f-f1051b61d0c2"
}
<URL TO Load Balancer> is reachable from both zones (there is dedicated url for zone 1 and zone 2).
Any idea what can be wrong? Both clusters seems to work.. there is only problem with that error message I dont know how to "reset" this state.
Updated by Mariusz Derela about 6 years ago
ok that problem was related with the network error. We can skip that :)