Bug #53029: radosgw-admin fails on "sync status" if a single RGW process is down - rgw - Ceph

Actions

Copy link

Bug #53029

closed

radosgw-admin fails on "sync status" if a single RGW process is down

Added by David Piper over 2 years ago. Updated 17 days ago.

Status:

Resolved

Priority:

Normal

Assignee:

Jane Zhu

Target version:

% Done:

Source:

Tags:

multisite multisite-backlog

Backport:

reef

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

53320

Crash signature (v1):

Crash signature (v2):

Description

We're using ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable) in a containerized deployment.
We have two RGW zones in the same zonegroup.
Each zone is hosted in a separate ceph cluster, and has four RGW endpoints.
The master zone's endpoints are configured as endpoints for the zonegroup.
(We're also using pubsub zones but I don't think this is related.)

When a single RGW endpoint from the master zone is stopped / crashes, the 'radosgw-admin sync status' command returns an error on the cluster hosting the non-master zone:

[qs-admin@newbrunswick0 ~]$ radosgw-admin sync status
+ sudo docker ps --filter name=ceph-rgw-.*rgw -q
+ sudo docker exec aa87acb445c5 radosgw-admin
realm 9d76aa86-99d1-41c3-966f-cc97eab2bfb3 (geored_realm)
zonegroup 384c36ac-374b-4ae2-bf9f-ae951f25920a (geored_zg)
zone b113b104-9c84-44ff-9058-4658c6e1df52 (siteB)
metadata sync syncing
full sync: 0/64 shards
failed to fetch master sync status: (5) Input/output error
data sync source: 0bbdd7ae-6e2a-4ad0-996b-5f0ed38443c1 (siteA)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
source: 9be18697-7423-41a7-a338-926aa938f9de (siteBpubsub)
not syncing from zone
source: a2a5b39a-3df5-4be3-9270-68bf90bc2a51 (siteApubsub)
not syncing from zone

This is easy to repro by stopping any of the RGW containers in the master zone. As far as we can tell, sync is still taking place. Once the container is restarted, the sync status command returns normally again.

[qs-admin@newbrunswick0 ~]$ radosgw-admin sync status
+ sudo docker ps --filter name=ceph-rgw-.*rgw -q
+ sudo docker exec aa87acb445c5 radosgw-admin
realm 9d76aa86-99d1-41c3-966f-cc97eab2bfb3 (geored_realm)
zonegroup 384c36ac-374b-4ae2-bf9f-ae951f25920a (geored_zg)
zone b113b104-9c84-44ff-9058-4658c6e1df52 (siteB)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 0bbdd7ae-6e2a-4ad0-996b-5f0ed38443c1 (siteA)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 3 shards
behind shards: [90,101,107]
oldest incremental change not applied: 2021-10-25T13:59:52.007974+0000 [90]
6 shards are recovering
recovering shards: [2,3,54,57,107,116]
source: 9be18697-7423-41a7-a338-926aa938f9de (siteBpubsub)
not syncing from zone
source: a2a5b39a-3df5-4be3-9270-68bf90bc2a51 (siteApubsub)
not syncing from zone

Unless we have misconfigured something, this feels like a bug: the other RGW endpoints should be suitable for reporting sync status?

RGW config:

(newbrunswick0 = 10.245.0.40)

[qs-admin@newbrunswick0 ~]$ radosgw-admin zonegroup get
+ sudo docker ps --filter name=ceph-rgw-.*rgw -q
+ sudo docker exec aa87acb445c5 radosgw-admin {
"id": "384c36ac-374b-4ae2-bf9f-ae951f25920a",
"name": "geored_zg",
"api_name": "geored_zg",
"is_master": "true",
"endpoints": [
"https://10.245.0.20:7480",
"https://10.245.0.21:7480",
"https://10.245.0.22:7480",
"https://10.245.0.23:7480"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "0bbdd7ae-6e2a-4ad0-996b-5f0ed38443c1",
"zones": [ {
"id": "0bbdd7ae-6e2a-4ad0-996b-5f0ed38443c1",
"name": "siteA",
"endpoints": [
"https://10.245.0.20:7480",
"https://10.245.0.21:7480",
"https://10.245.0.22:7480",
"https://10.245.0.23:7480"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}, {
"id": "9be18697-7423-41a7-a338-926aa938f9de",
"name": "siteBpubsub",
"endpoints": [
"https://10.245.0.40:7481",
"https://10.245.0.41:7481",
"https://10.245.0.42:7481",
"https://10.245.0.43:7481"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "pubsub",
"sync_from_all": "false",
"sync_from": [
"siteB"
],
"redirect_zone": ""
}, {
"id": "a2a5b39a-3df5-4be3-9270-68bf90bc2a51",
"name": "siteApubsub",
"endpoints": [
"https://10.245.0.20:7481",
"https://10.245.0.21:7481",
"https://10.245.0.22:7481",
"https://10.245.0.23:7481"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "pubsub",
"sync_from_all": "false",
"sync_from": [
"siteA"
],
"redirect_zone": ""
}, {
"id": "b113b104-9c84-44ff-9058-4658c6e1df52",
"name": "siteB",
"endpoints": [
"https://10.245.0.40:7480",
"https://10.245.0.41:7480",
"https://10.245.0.42:7480",
"https://10.245.0.43:7480"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [ {
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "9d76aa86-99d1-41c3-966f-cc97eab2bfb3",
"sync_policy": {
"groups": []
}
}

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Casey Bodley over 2 years ago

Status changed from New to Triaged
Assignee set to Casey Bodley
Tags set to multisite

Actions

Copy link

Updated by Casey Bodley over 1 year ago

Assignee deleted (~~Casey Bodley~~)

Actions

Copy link

Updated by Casey Bodley 12 months ago

Tags changed from multisite to multisite multisite-backlog

Actions

Copy link

Updated by Jane Zhu 11 months ago

The root cause:
radosgw-admin sync status command sends request to retrieve the info for each metadata/data log shard to each individual endpoint listed in the zonegroup settings in a round-robin way. The entire command fails if any of the requests fails. There is no retry in place.

Proposed fix:
Introduce a retry logic in the radosgw-admin sync status command.
The retry can be done in the way that it simply goes to the next endpoint if the current one fails. But it can be very inefficient in case of multiple endpoint failures.
In order to do this more efficiently, we may want to maintain a connection status for each endpoint in RGWRestConn. Maintaining the status in RGWRestConn can benefit other places where a retry logic is needed. The status can come with a timestamp so it can be invalidated after a short period (assuming the corresponding rgw instance may recover quickly).

@Casey, we had a brief discussion on this solution in last week's refactoring meeting that you were absent from. I would like to go through with you as well to see if you think the RGWRestConn is the right place to maintain the connection status for endpoints.

Actions

Copy link