With RGWs configured in a load balancer, quota stats cache doesn't work
With RGWs configured in a load balancer, quota stats cache can possibly run into unbound values. We have found errors like below in our clusters running Jewel. This happens when PUT and DELETE operations do not hit the same RGW and an eventual update_stats() from a DELETE operation tries to decrement the stats cache. This can be easily verified by having RGWs configured in a load balancer(I've used HAProxy in RR mode) and running a script to upload/delete objects, with the user quota enabled.
20 quota: can't use cached stats, exceeded soft threshold (num objs): 18446744073709551615 >= 190000
10 quota exceeded: stats.num_kb_rounded=18446744073709549572 size_kb=1024 user_quota.max_size_kb=5242880000
- Status changed from New to Need Review
Dcan you reproduce it on newer ceph versions?
Yes it is reproducible on master as well, sorry if the affected versions field has misled you.
Can you provide rgw logs? (debug_rgw = 20)
From the code it should have return false on can_use_cache_stats and fetch the stats from the storage. Those are not supposed to be negative
OK we fetch the storage stats only for the bucket stats not the user stats.
The failure is for the user stats, making sure they do not over flowing will fix this.
- Backport set to jewel, kraken
- Status changed from Need Review to Need Test
- Status changed from Need Test to Pending Backport
- Copied to Backport #20821: jewel: With RGWs configured in a load balancer, quota stats cache doesn't work added
- Copied to Backport #20822: kraken: With RGWs configured in a load balancer, quota stats cache doesn't work added
Also available in: Atom