With RGWs configured in a load balancer, quota stats cache doesn't work
With RGWs configured in a load balancer, quota stats cache can possibly run into unbound values. We have found errors like below in our clusters running Jewel. This happens when PUT and DELETE operations do not hit the same RGW and an eventual update_stats() from a DELETE operation tries to decrement the stats cache. This can be easily verified by having RGWs configured in a load balancer(I've used HAProxy in RR mode) and running a script to upload/delete objects, with the user quota enabled.
20 quota: can't use cached stats, exceeded soft threshold (num objs): 18446744073709551615 >= 190000 10 quota exceeded: stats.num_kb_rounded=18446744073709549572 size_kb=1024 user_quota.max_size_kb=5242880000
#13 Updated by Pavan Rallabhandi about 1 year ago
Aleksei Gutikov wrote:
Fix does not fix anything actually.
size, size_rounded, num_objects are uint64_t so they are always >= 0
Yes, the original logic seems to have got lost as part of making the code readable during the review. My bad that I didn't verify the logic post review, thanks!