Project

General

Profile

Actions

Bug #15547

closed

[rgw multisite] - radosgw-admin sync status failed to retrieve sync info, failing with "(13) Permission denied"

Added by Kumar Hemanth about 8 years ago. Updated over 7 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Steps:
------
On an existing Active/Active multisite configuration, added a new node to configure a Active/Passive configuration.
Command which I executed to make the rgw node readonly :-
$ radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-3 --access-key=${access_key} --secret=${secret} --endpoints=http://rgw2:80 --default --readonly
let me know if this is the right command to make the node read-only

Output of radosgw-admin period get --staging :- http://pastebin.com/AQKndvyG , us-3 zone has been set to readonly

Once the zone add was completed, restarted rgw services on both the nodes.
Only the metadata was seen in the pools and actual container and buckets were missing.
[root@rgw3 ~]# rados ls -p us-3.rgw.data.root
.bucket.meta.Container1:ee31c3bd-8907-407c-a58c-30aa23c2be62.54105.2
.bucket.meta.Container2:ee31c3bd-8907-407c-a58c-30aa23c2be62.54149.1
.bucket.meta.my-new-bucket:ee31c3bd-8907-407c-a58c-30aa23c2be62.54105.1

When checked the sync status it failed with permission denied error.
Restarted the rgw service again but no luck.

[root@rgw1 ceph]# radosgw-admin sync status
realm 19cf8f2b-2368-486c-9592-7225cff4bf13 (earth)
zonegroup 8e358edb-cd6e-4cf9-91bc-59df68e8a86f (us)
zone ee31c3bd-8907-407c-a58c-30aa23c2be62 (us-1)
metadata sync no sync (zone is master)
2016-04-20 09:23:09.313212 7f1ebb861a40 0 ERROR: failed to fetch datalog info
2016-04-20 09:23:09.317088 7f1ebb861a40 0 ERROR: failed to fetch datalog info
data sync source: 22ce2326-a251-49b1-8ddb-c73428156b48 (us-3)
failed to retrieve sync info: (5) Input/output error
source: ea9e8433-4e45-4251-ac7e-4298af61653c (us-2)
failed to retrieve sync info: (13) Permission denied

[root@rgw3 ~]# radosgw-admin sync status
realm 19cf8f2b-2368-486c-9592-7225cff4bf13 (earth)
zonegroup 8e358edb-cd6e-4cf9-91bc-59df68e8a86f (us)
zone 22ce2326-a251-49b1-8ddb-c73428156b48 (us-3)
metadata sync failed to read sync status: (2) No such file or directory
2016-04-20 09:23:50.242114 7f9f28396a40 0 ERROR: failed to fetch datalog info
data sync source: ea9e8433-4e45-4251-ac7e-4298af61653c (us-2)
failed to retrieve sync info: (13) Permission denied
source: ee31c3bd-8907-407c-a58c-30aa23c2be62 (us-1)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
Ceph Version :- v10.1.1

Additional Info :-
Note that rgw1 and rgw2 nodes were already configured with active/active site, now rgw3 is the new addition and it has been set to readonly.
SO that rgw1 and rgw3 can act as active/passive DR setup

Master zone log :- rgw1
----------------------
2016-04-20 09:30:25.316434 7ff8777fe700 0 ERROR: failed to fetch datalog info
2016-04-20 09:30:45.311773 7ff8817fa700 0 ERROR: failed to fetch datalog info
2016-04-20 09:30:45.316541 7ff8777fe700 0 ERROR: failed to fetch datalog info
2016-04-20 09:30:47.525290 7ff7a4fc9700 0 ERROR: failed to wait for op, ret=-11: POST http://rgw3:8080/admin/realm/period?period=5d47c9e2-c49f-4f12-b4a5-5986069ebcc0&epoch=3&rgwx-zonegr
oup=8e358edb-cd6e-4cf9-91bc-59df68e8a86f
2016-04-20 09:30:50.530628 7ff7a4fc9700 0 ERROR: failed to wait for op, ret=-13: POST http://rgw2:8080/admin/realm/period?period=5d47c9e2-c49f-4f12-b4a5-5986069ebcc0&epoch=3&rgwx-zonegr
oup=8e358edb-cd6e-4cf9-91bc-59df68e8a86f

http://pastebin.com/B4pCN8Dp

On Read-only node:- rgw3
------------------------
Logs :- http://pastebin.com/8PdJMh95

Actions #1

Updated by Anonymous almost 8 years ago

Hi Kumar,

I tried reproducing this locally, to see if I could make some progress. I used mstart.sh, created 3 'datacenters'. d1 and d2 were active/active. d3 was created as a peer of d1, but the zone was set to --read-only when adding to the default zonegroup. The rgw of d3 was restart with the --rgw-zone set, etc.

Sync status reports success. In fact the only way I could generate EIO failures from things like RGWRemoteDataLog::read_log_info() (responsible for the "ERROR: failed to fetch datalog info") messages, was to prevent the RGWRESTConn from successfully completing get_json_resource() calls by killing the master rgw instace on d1 ;)

I wonder, any chance you have something blocking the connect from us-3 to the master rgw instance (ie. firewall, etc)? In fact, I see some mixed port numbers: "http://rgw2:80" passed to radosgw-admin, while the logs show rgw's listening on 8080?

When you created us-3, you were able to successfully pull the realm information from the master, and do the final period update after setting us-3 to --read-only, but before restarting the us-3 rgw instance?

Maybe I'm not understanding the correct procedure for creating this 'passive' node?

Best regards,
Karol

Actions #2

Updated by Yehuda Sadeh almost 8 years ago

Kumar, do you still see this bug with latest version?

Actions #3

Updated by Orit Wasserman almost 8 years ago

  • Status changed from New to Can't reproduce
Actions #4

Updated by Russell Islam over 7 years ago

Getting this error with anableing ssl mode
ERROR: failed to wait for op, ret=-13: POST https://ceph-us-east-2:443/admin/realm/period?period=f6cfa099-7274-4277-a5d5-5e1b4e52196b&epoch=3&rgwx-zonegroup=d26a38f5-6722-4322-a196-f015783f1950

python list.py
Traceback (most recent call last):
File "list.py", line 14, in <module>
for bucket in conn.get_all_buckets():
File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line 440, in get_all_buckets
response.status, response.reason, body)
boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
<Error><Code>InvalidAccessKeyId</Code><RequestId>tx000000000000000000144-0057bb6b36-1052-default</RequestId><HostId>1052-default-default</HostId></Error>

Actions #5

Updated by Russell Islam over 7 years ago

Sorry. Misplaced. Ignore the above comments.

Actions

Also available in: Atom PDF