Actions
Bug #59164
closedLC rules cause latency spikes
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Source:
Tags:
lifecycle
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We have a number of lifecycle rules that have object expirations set to 5-7 days. I have been noticing latency spikes pretty regularly on these buckets (i.e. every 4 operations makes a PUT operation for an 800MiB file go from 3-4s to 160s+, or a PUT for a 150MiB can go from 0.15s to 7s). Debugging the logs, I have traced this to the lifecycle rule header steps for example:
debug 2023-03-24T20:28:52.849+0000 7efd52095700 20 req 16391328287536034602 0.507005215s s3:get_obj RGWObjManifest::operator++(): result: ofs=19922944 stripe_ofs=19922944 part_ofs=15728640 rule->part_size=5242880
debug 2023-03-24T20:28:52.849+0000 7efd52095700 20 req 16391328287536034602 0.507005215s s3:get_obj rados->get_obj_iterate_cb oid=9394f5d7-23ae-40c0-b813-9d0d7bc5cdc8.18373438.3__shadow_3f0666b9-37d9-4259-b8c5-d65a6ed016e6.tmp.2~z6j_XmAAaz7Ulp69KZ-qwPMTepXSPRZ.4_1 obj-ofs=19922944 read_ofs=0 len=1048576
debug 2023-03-24T20:28:52.849+0000 7efd52095700 10 req 16391328287536034602 0.507005215s rule: cgf005prr2v3m88aeu0g prefix: expiration: date: days: 5 noncur_expiration: date: days: 0
debug 2023-03-24T20:29:02.087+0000 7efdad14b700 10 req 16391328287536034602 9.744100571s rule: cgf005prr2v3m88aeu0g prefix: expiration: date: days: 5 noncur_expiration: date: days: 0
debug 2023-03-24T20:29:02.123+0000 7efdd699e700 10 req 16391328287536034602 9.780099869s rule: cgf005prr2v3m88aeu0g prefix: expiration: date: days: 5 noncur_expiration: date: days: 0
debug 2023-03-24T20:29:02.253+0000 7efd4d88c700 10 req 16391328287536034602 9.910101891s rule: cgf005prr2v3m88aeu0g prefix: expiration: date: days: 5 noncur_expiration: date: days: 0
debug 2023-03-24T20:29:02.286+0000 7efd3485a700 10 req 16391328287536034602 9.943102837s rule: cgf005prr2v3m88aeu0g prefix: expiration: date: days: 5 noncur_expiration: date: days: 0
debug 2023-03-24T20:29:02.363+0000 7efdef1cf700 20 req 16391328287536034602 10.020103455s s3:get_obj RGWObjManifest::operator++(): rule->part_size=5242880 rules.size()=2
debug 2023-03-24T20:29:02.363+0000 7efdef1cf700 20 req 16391328287536034602 10.020103455s s3:get_obj RGWObjManifest::operator++(): stripe_ofs=24117248 part_ofs=15728640 rule->part_size=5242880
debug 2023-03-24T20:29:02.363+0000 7efdef1cf700 20 req 16391328287536034602 10.020103455s s3:get_obj RGWObjManifest::operator++(): result: ofs=20971520 stripe_ofs=20971520 part_ofs=20971520 rule->part_size=5242880
debug 2023-03-24T20:29:02.363+0000 7efdef1cf700 20 req 16391328287536034602 10.020103455s s3:get_obj rados->get_obj_iterate_cb oid=9394f5d7-23ae-40c0-b813-9d0d7bc5cdc8.18373438.3__multipart_3f0666b9-37d9-4259-b8c5-d65a6ed016e6.tmp.2~z6j_XmAAaz7Ulp69KZ-qwPMTepXSPRZ.5 obj-ofs=20971520 read_ofs=0 len=4194304
Full log available as a download. This does not appear to happen every time and removing the lifecycle seems get rid of the spikes.
The cluster is deployed via Rook.
Files
Actions