Project

General

Profile

Actions

Bug #59164

closed

LC rules cause latency spikes

Added by Jamison Lofthouse about 1 year ago. Updated 11 months ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
lifecycle
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have a number of lifecycle rules that have object expirations set to 5-7 days. I have been noticing latency spikes pretty regularly on these buckets (i.e. every 4 operations makes a PUT operation for an 800MiB file go from 3-4s to 160s+, or a PUT for a 150MiB can go from 0.15s to 7s). Debugging the logs, I have traced this to the lifecycle rule header steps for example:

debug 2023-03-24T20:28:52.849+0000 7efd52095700 20 req 16391328287536034602 0.507005215s s3:get_obj RGWObjManifest::operator++(): result: ofs=19922944 stripe_ofs=19922944 part_ofs=15728640 rule->part_size=5242880
debug 2023-03-24T20:28:52.849+0000 7efd52095700 20 req 16391328287536034602 0.507005215s s3:get_obj rados->get_obj_iterate_cb oid=9394f5d7-23ae-40c0-b813-9d0d7bc5cdc8.18373438.3__shadow_3f0666b9-37d9-4259-b8c5-d65a6ed016e6.tmp.2~z6j_XmAAaz7Ulp69KZ-qwPMTepXSPRZ.4_1 obj-ofs=19922944 read_ofs=0 len=1048576
debug 2023-03-24T20:28:52.849+0000 7efd52095700 10 req 16391328287536034602 0.507005215s rule: cgf005prr2v3m88aeu0g prefix:  expiration:  date:  days: 5 noncur_expiration:  date:  days: 0
debug 2023-03-24T20:29:02.087+0000 7efdad14b700 10 req 16391328287536034602 9.744100571s rule: cgf005prr2v3m88aeu0g prefix:  expiration:  date:  days: 5 noncur_expiration:  date:  days: 0
debug 2023-03-24T20:29:02.123+0000 7efdd699e700 10 req 16391328287536034602 9.780099869s rule: cgf005prr2v3m88aeu0g prefix:  expiration:  date:  days: 5 noncur_expiration:  date:  days: 0
debug 2023-03-24T20:29:02.253+0000 7efd4d88c700 10 req 16391328287536034602 9.910101891s rule: cgf005prr2v3m88aeu0g prefix:  expiration:  date:  days: 5 noncur_expiration:  date:  days: 0
debug 2023-03-24T20:29:02.286+0000 7efd3485a700 10 req 16391328287536034602 9.943102837s rule: cgf005prr2v3m88aeu0g prefix:  expiration:  date:  days: 5 noncur_expiration:  date:  days: 0
debug 2023-03-24T20:29:02.363+0000 7efdef1cf700 20 req 16391328287536034602 10.020103455s s3:get_obj RGWObjManifest::operator++(): rule->part_size=5242880 rules.size()=2
debug 2023-03-24T20:29:02.363+0000 7efdef1cf700 20 req 16391328287536034602 10.020103455s s3:get_obj RGWObjManifest::operator++(): stripe_ofs=24117248 part_ofs=15728640 rule->part_size=5242880
debug 2023-03-24T20:29:02.363+0000 7efdef1cf700 20 req 16391328287536034602 10.020103455s s3:get_obj RGWObjManifest::operator++(): result: ofs=20971520 stripe_ofs=20971520 part_ofs=20971520 rule->part_size=5242880
debug 2023-03-24T20:29:02.363+0000 7efdef1cf700 20 req 16391328287536034602 10.020103455s s3:get_obj rados->get_obj_iterate_cb oid=9394f5d7-23ae-40c0-b813-9d0d7bc5cdc8.18373438.3__multipart_3f0666b9-37d9-4259-b8c5-d65a6ed016e6.tmp.2~z6j_XmAAaz7Ulp69KZ-qwPMTepXSPRZ.5 obj-ofs=20971520 read_ofs=0 len=4194304

Full log available as a download. This does not appear to happen every time and removing the lifecycle seems get rid of the spikes.

The cluster is deployed via Rook.


Files

case6-filtered-sanitized.log (378 KB) case6-filtered-sanitized.log Full log from RGW Jamison Lofthouse, 03/24/2023 09:00 PM

Related issues 3 (0 open3 closed)

Related to rgw - Bug #58673: When bucket index ops are cancelled it can leave behind zombie index entriesResolvedCasey Bodley

Actions
Related to rgw - Bug #59663: rgw: expired delete markers created by deleting non-existant object multiple times are not being removed from data pool after deletion from bucketResolvedCory Snyder

Actions
Related to rgw - Bug #61359: Consistency bugs with OLH objectsResolvedCory Snyder

Actions
Actions

Also available in: Atom PDF