Project

General

Profile

Actions

Bug #15489

open

CORS property of bucket can get out of sync with multiple RGWs, caching issue

Added by Robin Johnson about 8 years ago. Updated 20 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This is a really hard bug to reproduce. It's happened about once a month for the last 6 months (maybe longer), each time for a single bucket.

In our case, we've got 10 RGWs (behind haproxies). Tests performed directly against the civetweb RGWs, bypassing the haproxy.

Testcase:
- On the bad bucket:
- pass1: loop over each RGW, fetch the CORS configuration and add a CORS rule with the ID value named uniquely to the RGW. I use the name 's3-cors-health-$HOSTNAME'.
- sleep
- pass2: loop over all RGWs, fetch the CORS configuration again:
-- On the good RGWs, you'll get a CORS configuration that misses the rule for the bad RGWs.
-- On the bad RGWs, you'll get some older version of the CORS configuration, that never gets updated anymore (this is VERY evident when the bucket initially had no CORS configuration).

What I don't know is what the initial trigger condition is. The most recent one, is on a user bucket that is 4 days old.
I cannot reproduce it on any other bucket right now, including many of various ages (including one made a few minutes before the user bucket).

The only thing I can say is that we haven't seen the issue happen on any RGW instances with runtimes under 1 week.

Restarting the bad RGW instances causes the problem to be resolved.

Actions #1

Updated by Konstantin Shalygin 20 days ago

  • Source changed from other to Community (user)
Actions

Also available in: Atom PDF