Project

General

Profile

Actions

Documentation #62680

open

Docs for setting up multisite RGW don't work

Added by Zac Dover 8 months ago. Updated 8 months ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Spent time:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

An email from Petr Bena:

Hello,

My goal is to setup multisite RGW with 2 separate CEPH clusters in separate datacenters, where RGW data are being replicated. I created a lab for this purpose in both locations (with latest reef ceph installed using cephadm) and tried to follow this guide: https://docs.ceph.com/en/reef/radosgw/multisite/

Unfortunatelly, even after multiple attempts it always failed when creating a secondary zone. I could succesfully pull the realm from master, but that was pretty much the last trully succesful step. I can notice that immediately after pulling the realm to secondary, radosgw-admin user list returns an empty list (which IMHO should contain replicated user list from master). Continuing by setting default real and zonegroup and creating the secondary zone in secondary cluster I end up having 2 zones in each cluster, both seemingly in same zonegroup, but with replication failing - this is what I see in sync status:

(master) [ceph: root@ceph-lab-brn-01 /]# radosgw-admin sync status
realm d2c4ebf9-e156-4c4e-9d56-3fff6a652e75 (ceph)
zonegroup abc3c0ae-a84d-48d4-8e78-da251eb78781 (cz)
zone 97fb5842-713a-4995-8966-5afe1384f17f (cz-brn)
current time 2023-08-30T12:58:12Z
zonegroup features enabled: resharding
disabled: compress-encrypted
metadata sync no sync (zone is master)
2023-08-30T12:58:13.991+0000 7f583a52c780 0 ERROR: failed to fetch datalog info
data sync source: 13a8c663-b241-4d8a-a424-8785fc539ec5 (cz-hol)
failed to retrieve sync info: (13) Permission denied

(secondary) [ceph: root@ceph-lab-hol-01 /]# radosgw-admin sync status
realm d2c4ebf9-e156-4c4e-9d56-3fff6a652e75 (ceph)
zonegroup abc3c0ae-a84d-48d4-8e78-da251eb78781 (cz)
zone 13a8c663-b241-4d8a-a424-8785fc539ec5 (cz-hol)
current time 2023-08-30T12:58:54Z
zonegroup features enabled: resharding
disabled: compress-encrypted
metadata sync failed to read sync status: (2) No such file or directory
2023-08-30T12:58:55.617+0000 7ff37c9db780 0 ERROR: failed to fetch datalog info
data sync source: 97fb5842-713a-4995-8966-5afe1384f17f (cz-brn)
failed to retrieve sync info: (13) Permission denied

In master there is one user created during the process (synchronization-user), on slave there are no users and when I try to re-create this synchronization user it complains I shouldn't even try and instead execute the command on master. I can see same realm and zonegroup IDs on both sides, zone list is different though:

(master) [ceph: root@ceph-lab-brn-01 /]# radosgw-admin zone list {
"default_info": "97fb5842-713a-4995-8966-5afe1384f17f",
"zones": [
"cz-brn",
"default"
]
}

(secondary) [ceph: root@ceph-lab-hol-01 /]# radosgw-admin zone list {
"default_info": "13a8c663-b241-4d8a-a424-8785fc539ec5",
"zones": [
"cz-hol",
"default"
]
}

The permission denied error is puzzling me - could it be because real pull didn't sync the users? I tried this multiple times with clean ceph install on both sides - and always ended up the same. I even tried force creating the same user with same secrets on the other side, but it didn't help. How can I debug what kind of secret is secondary trying to use when communicating with master? Could it be that this multisite RGW setup is not yet truly supported in reef? I noticed that the documentation itself seems written for older ceph versions, as there are no mentions about orchestrator (for example in steps where configuration files of RGW need to be edited, which is done differently when using cephadm).

I think that documentation is simply wrong at this time. Either it's missing some crucial steps, or it's outdated or otherwise unclear - simply by following all the steps as outlined there, you are likely to end up the same.

Actions #1

Updated by Zac Dover 8 months ago

  • Status changed from New to In Progress

This procedure is expected to be tested during the first week of September 2023.

Actions

Also available in: Atom PDF