Project

General

Profile

Bug #20661

With RGWs configured in a load balancer, quota stats cache doesn't work

Added by Pavan Rallabhandi over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
jewel, kraken
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

With RGWs configured in a load balancer, quota stats cache can possibly run into unbound values. We have found errors like below in our clusters running Jewel. This happens when PUT and DELETE operations do not hit the same RGW and an eventual update_stats() from a DELETE operation tries to decrement the stats cache. This can be easily verified by having RGWs configured in a load balancer(I've used HAProxy in RR mode) and running a script to upload/delete objects, with the user quota enabled.

20 quota: can't use cached stats, exceeded soft threshold (num objs): 18446744073709551615 >= 190000

10 quota exceeded: stats.num_kb_rounded=18446744073709549572 size_kb=1024 user_quota.max_size_kb=5242880000

Related issues

Copied to rgw - Backport #20821: jewel: With RGWs configured in a load balancer, quota stats cache doesn't work Resolved
Copied to rgw - Backport #20822: kraken: With RGWs configured in a load balancer, quota stats cache doesn't work Rejected

History

#1 Updated by Pavan Rallabhandi over 5 years ago

  • Status changed from New to Fix Under Review

#2 Updated by Orit Wasserman over 5 years ago

Dcan you reproduce it on newer ceph versions?

#3 Updated by Pavan Rallabhandi over 5 years ago

Yes it is reproducible on master as well, sorry if the affected versions field has misled you.

#4 Updated by Orit Wasserman over 5 years ago

Can you provide rgw logs? (debug_rgw = 20)
From the code it should have return false on can_use_cache_stats and fetch the stats from the storage. Those are not supposed to be negative

#5 Updated by Orit Wasserman over 5 years ago

OK we fetch the storage stats only for the bucket stats not the user stats.
The failure is for the user stats, making sure they do not over flowing will fix this.

#6 Updated by Orit Wasserman over 5 years ago

  • Backport set to jewel, kraken

#7 Updated by Orit Wasserman over 5 years ago

  • Status changed from Fix Under Review to 17

#8 Updated by Yuri Weinstein over 5 years ago

Pavan Rallabhandi wrote:

https://github.com/ceph/ceph/pull/16389

merged

#9 Updated by Orit Wasserman over 5 years ago

  • Status changed from 17 to Pending Backport

#10 Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #20821: jewel: With RGWs configured in a load balancer, quota stats cache doesn't work added

#11 Updated by Nathan Cutler over 5 years ago

  • Copied to Backport #20822: kraken: With RGWs configured in a load balancer, quota stats cache doesn't work added

#12 Updated by Aleksei Gutikov over 5 years ago

Fix does not fix anything actually.
size, size_rounded, num_objects are uint64_t so they are always >= 0

#13 Updated by Pavan Rallabhandi over 5 years ago

Aleksei Gutikov wrote:

Fix does not fix anything actually.
size, size_rounded, num_objects are uint64_t so they are always >= 0

Yes, the original logic seems to have got lost as part of making the code readable during the review. My bad that I didn't verify the logic post review, thanks!

#14 Updated by Nathan Cutler over 5 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF