Project

General

Profile

Actions

Bug #65071

closed

Slow RGW multisite sync due to "304 Not Modified" responses on primary zone

Added by Mohammad Saif about 1 month ago. Updated about 1 month ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
upgrade/quincy-x
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,
We have 2 clusters (v18.2.1) primarily used for RGW which has over 2+ billion RGW objects.
They are also in multisite configuration totaling to 2 zones and we've got around 2
Gbps of bandwidth dedicated (P2P) for the multisite traffic. We see that using
"radosgw-admin sync status" on the zone 2, all the 128 shards are recovering and
unfortunately there is very less data transfer from primary zone ie., the link utilization
is barely 100 Mbps / 2 Gbps. Our objects are quite small as well like avg. of 1 MB in
size.
On further inspection, we noticed the rgw access the logs at primary site are mostly
yielding "304 Not Modified" for RGWs at site-2. Is this expected? Here are some
of the logs (information is redacted)

root@host-04:~# tail f /var/log/haproxy-msync.log
Feb 12 05:06:51 host-04 haproxy971171: 10.1.85.14:33730 [12/Feb/2024:05:06:51.047]
https~ backend/host-04-msync 0/0/0/2/2 304 143 - - --- 56/55/1/0/0 0/0 "GET
/bucket1/object1.jpg?rgwx-zonegroup=71dceb3d-3092-4dc6-897f-a9abf60c9972&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=a8204ce2-b69e-4d90-bca1-93edd05a1a29%3Abucket1%3A8b96aea5-c763-40a3-8430-efd67cff0c62.20010.7
HTTP/1.1"
Feb 12 05:06:51 host-04 haproxy971171: 10.1.85.14:59730 [12/Feb/2024:05:06:51.048]
https~ backend/host-04-msync 0/0/0/2/2 304 143 - - ---- 56/55/3/1/0 0/0 "GET
/bucket1/object91.jpg?rgwx-zonegroup=71dceb3d-3092-4dc6-897f-a9abf60c9972&rgwx-prepend-metadata=true&rgwx-sync-manifest&rgwx-sync-cloudtiered&rgwx-skip-decrypt&rgwx-if-not-replicated-to=a8204ce2-b69e-4d90-bca1-93edd05a1a29%3Abucket1%3A8b96aea5-c763-40a3-8430-efd67cff0c62.20010.7
HTTP/1.1"

We also took a look at our grafana instance and out of 1000 requests / second, 200 are
"200 OK" and 800 are "304 Not Modified". Sync threads are run on only
2 rgw daemons per zone and are behind a Load Balancer. "# radosgw-admin sync error
list" also contains around 20 errors which are mostly automatically recoverable.
As we understand, does it mean that RGW multisite sync logs in the log pool are yet to be
generated or some sort? Please provide us some insights and let us know how to resolve
this.

Thanks,
Mohammd Saif


Related issues 1 (1 open0 closed)

Is duplicate of rgw - Bug #64999: Slow RGW multisite sync due to "304 Not Modified" responses on primary zoneNew

Actions
Actions #1

Updated by Casey Bodley about 1 month ago

  • Is duplicate of Bug #64999: Slow RGW multisite sync due to "304 Not Modified" responses on primary zone added
Actions #2

Updated by Casey Bodley about 1 month ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF