Bug #62938
closedRGW s3website API prefetches data for range requests
100%
Description
Similar issue to a Bug #44508. Reproducible only when using s3website API not on s3 API.
You can replicate it by running this wrk command wrk -t56 -c500 -d5m http://${rgwipaddress}:8080/${bucket}/videos/ -s wrk-range-small.lua
It will send a range requests to 7 mp4 files.
wrk script
-- Initialize the pseudo random number generator
math.randomseed( os.time())
math.random(); math.random(); math.random()
i = 1
function request()
if i == 8
then
i = 1
end
local nrangefrom = math.random()
local nrangeto = math.random(100)
local path = wrk.path
url = path..i..".mp4"
wrk.headers["Range"] = nrangefrom.."-"..nrangeto
i = i+1
return wrk.format(nil, url)
end
When testing it was reading at rate 3Gb/s from compared to ~22Mb/s on s3 RGW. In both situation the bw towards client was ~20Mb/s
In the RGW log I was able to find this entry.
2023-09-20T12:52:06.670+0000 7f216d702700 1 -- xxx.xxx.58.15:0/758879303 --> [v2:xxx.xxx.58.2:6816/8556,v1:xxx.xxx.58.2:6817/8556] -- osd_op(unknown.0.0:238 18.651 18:8a75a7b2:::39078a70-7768-48c8-96a5-1e13ced83b5b.58017020.1_videos%2f7.mp4:head [getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e60419) v8 -- 0x7f21dc00a420 con 0x7f21dc007820
You can find the OSD parts of the log here - https://pastebin.com/nGQw4ugd
Updated by Casey Bodley 7 months ago
- Status changed from New to Fix Under Review
- Assignee set to Casey Bodley
- Tags set to website
- Backport set to pacific quincy reef
- Pull request ID set to 53602
thanks Ondrej,
wrk.headers["Range"] = nrangefrom.."-"..nrangeto
from my reading of RGWGetObj::parse_range()
at https://github.com/ceph/ceph/blob/9fedc1e0/src/rgw/rgw_op.cc#L112-L139, it expects the format of the Range header to look like "bytes=from-to"
. this parsing logic seems consistent with the syntax described in https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range
when passing the "Range: bytes=-500"
header to an s3website endpoint, i see it correctly returning the final 500 bytes without prefetching from the given object. however, i do see an earlier object lookup incorrectly using s->prefetch_data=1
:
2023-09-22T15:26:21.995-0400 7f540f4ec6c0 10 req 13424627313409767571 0.003000102s retarget Starting retarget
2023-09-22T15:26:21.995-0400 7f540f4ec6c0 20 req 13424627313409767571 0.003000102s get_obj_state: rctx=0x559a351b1ce0 obj=testbucket:8m.iso state=0x559a3c2a2de8 s->prefetch_data=1
2023-09-22T15:26:21.995-0400 7f540f4ec6c0 1 -- 192.168.245.130:0/3181500925 --> [v2:192.168.245.130:6800/1515984712,v1:192.168.245.130:6801/1515984712] -- osd_op(unknown.0.0:200 6.0 6:710b9184:::a648e116-a6fb-48ba-a8d0-888d37298654.4149.1_8m.iso:head [getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e19) v8 -- 0x559a3c3c6a80 con 0x559a39fa2480
this was coming from the function bool RGWHandler_REST_S3Website::web_dir()
. i've opened https://github.com/ceph/ceph/pull/53602 to avoid prefetch there. with that fix applied, we only transfer the requested 500 bytes from the osd
Updated by Casey Bodley 7 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot 7 months ago
- Copied to Backport #63049: pacific: RGW s3website API prefetches data for range requests added
Updated by Backport Bot 7 months ago
- Copied to Backport #63050: quincy: RGW s3website API prefetches data for range requests added
Updated by Backport Bot 7 months ago
- Copied to Backport #63051: reef: RGW s3website API prefetches data for range requests added
Updated by Backport Bot 7 months ago
- Tags changed from website to website backport_processed
Updated by Konstantin Shalygin 6 months ago
- Status changed from Pending Backport to Resolved
- Target version set to v19.0.0
- % Done changed from 0 to 100
- Source set to Community (dev)