Project

General

Profile

Support #22822

RGW multi site issue - data sync: ERROR: failed to fetch datalog info

Added by Mariusz Derela over 3 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

I have noticed a strange issue. My synchronization worked quite ok for a some while.. But today in on one site I have something like this:

Slave - lets call them (pl-2)


# radosgw-admin sync status
          realm c6055c2e-5ac0-4638-851f-f1051b61d0c2 (ofp)
      zonegroup 4134640c-d16b-4166-bbd6-987637da469d (pl)
           zone 6328c6d7-31a5-4d42-8359-1e28689572da (pl-2)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 8adfe5fc-65df-4227-9d85-1d0d1e66ac1f (pl-2)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 7 shards
                        oldest incremental change not applied: 2018-01-28 13:46:05.0.415988s


So seems to be ok. However on the Master Zone I have:

radosgw-admin sync status
          realm c6055c2e-5ac0-4638-851f-f1051b61d0c2 (ofp)
      zonegroup 4134640c-d16b-4166-bbd6-987637da469d (pl)
           zone 8adfe5fc-65df-4227-9d85-1d0d1e66ac1f (pl-1)
  metadata sync no sync (zone is master)
2018-01-28 20:10:45.957218 7fd93b76bc40  0 data sync: ERROR: failed to fetch datalog info
      data sync source: 6328c6d7-31a5-4d42-8359-1e28689572da (pl-2)
                        failed to retrieve sync info: (5) Input/output error

Sometimes the output from the master looks like that:

          realm c6055c2e-5ac0-4638-851f-f1051b61d0c2 (ofp)
      zonegroup 4134640c-d16b-4166-bbd6-987637da469d (pl)
           zone 8adfe5fc-65df-4227-9d85-1d0d1e66ac1f (pl-1)
  metadata sync no sync (zone is master)
      data sync source: 6328c6d7-31a5-4d42-8359-1e28689572da (pl-2)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
2018-01-28 21:12:36.506006 7f59ead51c40  0 data sync: ERROR: failed to fetch datalog info

Restarting of RGW doesnt provide any value... It is:

ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)

below you can also find an information about my zone:

radosgw-admin zonegroup get

{
    "id": "4134640c-d16b-4166-bbd6-987637da469d",
    "name": "pl",
    "api_name": "pl",
    "is_master": "true",
    "endpoints": [
        "https:// URL TO F5 Load Balancer:443" 
    ],
    "hostnames": [],
    "hostnames_s3website": [],
    "master_zone": "8adfe5fc-65df-4227-9d85-1d0d1e66ac1f",
    "zones": [
        {
            "id": "6328c6d7-31a5-4d42-8359-1e28689572da",
            "name": "pl-2",
            "endpoints": [
                "https://URL TO F5 Load Balancer" 
            ],
            "log_meta": "false",
            "log_data": "true",
            "bucket_index_max_shards": 0,
            "read_only": "false",
            "tier_type": "",
            "sync_from_all": "true",
            "sync_from": []
        },
        {
            "id": "8adfe5fc-65df-4227-9d85-1d0d1e66ac1f",
            "name": "pl-1",
            "endpoints": [
                "https://URL TO LOAD BALANCER:443" 
            ],
            "log_meta": "false",
            "log_data": "true",
            "bucket_index_max_shards": 0,
            "read_only": "false",
            "tier_type": "",
            "sync_from_all": "true",
            "sync_from": []
        }
    ],
    "placement_targets": [
        {
            "name": "default-placement",
            "tags": []
        }
    ],
    "default_placement": "default-placement",
    "realm_id": "c6055c2e-5ac0-4638-851f-f1051b61d0c2" 
}

<URL TO Load Balancer> is reachable from both zones (there is dedicated url for zone 1 and zone 2).

Any idea what can be wrong? Both clusters seems to work.. there is only problem with that error message I dont know how to "reset" this state.

History

#1 Updated by Mariusz Derela over 3 years ago

ok that problem was related with the network error. We can skip that :)

Also available in: Atom PDF