Project

General

Profile

Bug #61224

rgw: slow bucket creation due to cache invalidation failures

Added by Shilpa MJ 11 months ago. Updated 9 months ago.

Status:
Triaged
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Thanks to @mkogan for testing this and providing the logs.

Bucket creation can take a long time when notify fails on object due to -ETIMEDOUT response where we retry ten times.

2023-05-17T10:36:27.533+0000 7fffe8e77700 1 -- 172.21.5.105:0/3146374879 <== osd.0 v2:172.21.5.105:6802/842320433 277275 ==== watch-notify(notify_complete (2) cookie 174948352 notify 167503752719 ret 110) v3 ==== 42+0+100 (crc 0 0 0) 0x7c8b520 con 0x2ef9c00
2023-05-17T10:36:27.534+0000 7fff204e6700 1 req 1495756601475880094 210.017807007s s3:create_bucket int RGWSI_Notify::robust_notify(const DoutPrefixProvider *, RGWSI_RADOS::Obj &, const RGWCacheNotifyInfo &, optional_yield):432 invalidation attempt 8 failed: (110) Connection timed out
2023-05-17T10:36:27.534+0000 7fff204e6700 1 req 1495756601475880094 210.017807007s s3:create_bucket int RGWSI_Notify::robust_notify(const DoutPrefixProvider *, RGWSI_RADOS::Obj &, const RGWCacheNotifyInfo &, optional_yield):427 Invalidating obj=us-east.rgw.meta:root:bkt001 tries=9
2023-05-17T10:36:27.534+0000 7fff204e6700 1 -
172.21.5.105:0/3146374879 --> [v2:172.21.5.105:6802/842320433,v1:172.21.5.105:6803/842320433] -- osd_op(unknown.0.0:220988 8.0 8:93e5b521:::notify.7:head [notify cookie 140392448 in=368b] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e39) v8 -- 0x929b880 con 0x2ef9c00
2023-05-17T10:36:27.534+0000 7fffe8e77700 1 -- 172.21.5.105:0/3146374879 <== osd.0 v2:172.21.5.105:6802/842320433 277276 ==== watch-notify(notify (1) cookie 35148800 notify 167503752720 ret 0) v3 ==== 398+0+0 (crc 0 0 0) 0x7c8b520 con 0x2ef9c00

History

#1 Updated by Casey Bodley 11 months ago

  • Project changed from Ceph to rgw

#2 Updated by Casey Bodley 10 months ago

  • Subject changed from rgw: slow bucket creation due to notification failures to rgw: slow bucket creation due to cache invalidation failures

#3 Updated by Mark Kogan 10 months ago

  • Project changed from rgw to Ceph
  • Severity deleted (3 - minor)

narrowed the cause to the background execution of

radosgw-admin bucket stats

- OR -
with Ceph Web Dashboard:

while creating bucket(s) when the already existing buckets number > ~20,000

❯ time s3cmd --host=127.0.0.1:8000 mb s3://bkt001
Bucket 's3://bkt001/' created
s3cmd --host=127.0.0.1:8000 mb s3://bkt001  0.13s user 0.02s system 0% cpu 3:40.17 total 
                                                                           ^^^^^^^

#4 Updated by Mark Kogan 9 months ago

found that the problem reproduces by the web dashboard and (under a certain load) `radosgw-admin bucket stats` was because of over-tuning of `rgw_max_listing_results=2000` while the default is 1000, dialing it back to 1000 stopped the issue from reproducing by the dashboard

#5 Updated by Mark Kogan 9 months ago

  • Status changed from New to Triaged

Also available in: Atom PDF