Bug #61224
rgw: slow bucket creation due to cache invalidation failures
0%
Description
Thanks to @mkogan for testing this and providing the logs.
Bucket creation can take a long time when notify fails on object due to -ETIMEDOUT response where we retry ten times.
2023-05-17T10:36:27.533+0000 7fffe8e77700 1 -- 172.21.5.105:0/3146374879 <== osd.0 v2:172.21.5.105:6802/842320433 277275 ==== watch-notify(notify_complete (2) cookie 174948352 notify 167503752719 ret 110) v3 ==== 42+0+100 (crc 0 0 0) 0x7c8b520 con 0x2ef9c00 172.21.5.105:0/3146374879 --> [v2:172.21.5.105:6802/842320433,v1:172.21.5.105:6803/842320433] -- osd_op(unknown.0.0:220988 8.0 8:93e5b521:::notify.7:head [notify cookie 140392448 in=368b] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e39) v8 -- 0x929b880 con 0x2ef9c00
2023-05-17T10:36:27.534+0000 7fff204e6700 1 req 1495756601475880094 210.017807007s s3:create_bucket int RGWSI_Notify::robust_notify(const DoutPrefixProvider *, RGWSI_RADOS::Obj &, const RGWCacheNotifyInfo &, optional_yield):432 invalidation attempt 8 failed: (110) Connection timed out
2023-05-17T10:36:27.534+0000 7fff204e6700 1 req 1495756601475880094 210.017807007s s3:create_bucket int RGWSI_Notify::robust_notify(const DoutPrefixProvider *, RGWSI_RADOS::Obj &, const RGWCacheNotifyInfo &, optional_yield):427 Invalidating obj=us-east.rgw.meta:root:bkt001 tries=9
2023-05-17T10:36:27.534+0000 7fff204e6700 1 -
2023-05-17T10:36:27.534+0000 7fffe8e77700 1 -- 172.21.5.105:0/3146374879 <== osd.0 v2:172.21.5.105:6802/842320433 277276 ==== watch-notify(notify (1) cookie 35148800 notify 167503752720 ret 0) v3 ==== 398+0+0 (crc 0 0 0) 0x7c8b520 con 0x2ef9c00
History
#1 Updated by Casey Bodley 11 months ago
- Project changed from Ceph to rgw
#2 Updated by Casey Bodley 10 months ago
- Subject changed from rgw: slow bucket creation due to notification failures to rgw: slow bucket creation due to cache invalidation failures
#3 Updated by Mark Kogan 10 months ago
- Project changed from rgw to Ceph
- Severity deleted (
3 - minor)
narrowed the cause to the background execution of
radosgw-admin bucket stats
- OR -
with Ceph Web Dashboard:
while creating bucket(s) when the already existing buckets number > ~20,000
❯ time s3cmd --host=127.0.0.1:8000 mb s3://bkt001 Bucket 's3://bkt001/' created s3cmd --host=127.0.0.1:8000 mb s3://bkt001 0.13s user 0.02s system 0% cpu 3:40.17 total ^^^^^^^
#4 Updated by Mark Kogan 9 months ago
found that the problem reproduces by the web dashboard and (under a certain load) `radosgw-admin bucket stats` was because of over-tuning of `rgw_max_listing_results=2000` while the default is 1000, dialing it back to 1000 stopped the issue from reproducing by the dashboard
#5 Updated by Mark Kogan 9 months ago
- Status changed from New to Triaged