Bug #59739
openRGW D3n cache don't work for object with slash
0%
Description
Hello,
i have trouble with d3n cache. If a requested object has a slash in name, d3n can not catch it.
2023-05-14T15:39:10.662+0000 7fad52183700 10 D3nDataCache: flush(): bl.length <= rgw_get_obj_max_req_size (default 4MB) - write to datacache, bl.length=4194304
2023-05-14T15:39:10.662+0000 7fad52183700 10 D3nDataCache::put(): oid=2804fdd4-5ce0-4803-9588-5b622abc0ae2.10210173.6__shadow_process_launch/data/20230428_061954_01529_bzrj7-037aaaa7-8448-43a5-a4a4-49821d048de4.parquet.2~fvn7iKYGBa20LXxLl1AjL8RIILTl-LT.2_3, len=4194304
2023-05-14T15:39:10.662+0000 7fad52183700 30 D3nDataCache: d3n_libaio_create_write_request(): Write To Cache, oid=2804fdd4-5ce0-4803-9588-5b622abc0ae2.10210173.6__shadow_process_launch/data/20230428_061954_01529_bzrj7-037aaaa7-8448-43a5-a4a4-49821d048de4.parquet.2~fvn7iKYGBa20LXxLl1AjL8RIILTl-LT.2_3, len=4194304
2023-05-14T15:39:10.662+0000 7fad52183700 20 D3nDataCache: d3n_prepare_libaio_write_op(): Write To Cache, location=/rgw-cache/2804fdd4-5ce0-4803-9588-5b622abc0ae2.10210173.6__shadow_process_launch/data/20230428_061954_01529_bzrj7-037aaaa7-8448-43a5-a4a4-49821d048de4.parquet.2~fvn7iKYGBa20LXxLl1AjL8RIILTl-LT.2_3
2023-05-14T15:39:10.662+0000 7fad52183700 0 ERROR: D3nCacheAioWriteRequest::create_io: open file failed, errno=2, location='/rgw-cache/2804fdd4-5ce0-4803-9588-5b622abc0ae2.10210173.6__shadow_process_launch/data/20230428_061954_01529_bzrj7-037aaaa7-8448-43a5-a4a4-49821d048de4.parquet.2~fvn7iKYGBa20LXxLl1AjL8RIILTl-LT.2_3'
2023-05-14T15:39:10.662+0000 7fad52183700 0 ERROR: D3nDataCache: d3n_libaio_create_write_request() prepare libaio write op r=-1
2023-05-14T15:39:10.662+0000 7fad52183700 1 D3nDataCache: create_aio_write_request fail, r=-1
Files
Updated by Mark Kogan 11 months ago
Hi @Alexander Kazansky
Could you please elaborate how the `parquet` files with `object has a slash in name` are created for reproduction purposes?
like:
oid=2804fdd4-5ce0-4803-9588-5b622abc0ae2.10210173.6__shadow_process_launch/data/20230428_061954_01529_bzrj7-037aaaa7-8448-43a5-a4a4-49821d048de4.parquet.2~fvn7iKYGBa20LXxLl1AjL8RIILTl-LT.2_3
in a reproduction attempt of an object with slashes like `s3://bkt/xxx/8M.dat` the cached file name (`oid`) does not contain slashes:
❯ s3cmd ls --human --recursive s3://bkt 2023-05-22 15:24 8M s3://bkt/8M.dat 2023-05-22 15:26 8M s3://bkt/xxx/8M.dat ❯ ls -ltrh /mnt/nvme/rgw_datacache total 8.0M -rw-r--r-- 1 root root 4.0M May 22 15:25 1d67f18b-f606-4327-822e-79a16373869d.4204.2__shadow_.Il5abVEb3Tu3XKIGSdZ5-eLQj90hrU8_1 -rw-r--r-- 1 root root 4.0M May 22 15:27 1d67f18b-f606-4327-822e-79a16373869d.4204.2__shadow_.PPXgs45aR6ixGWzMIxcvRwy5IMa2vRl_1
Updated by Alexander Kazansky 11 months ago
Mark Kogan wrote:
Hi @Alexander Kazansky
Hi Mark,
Could you please elaborate how the `parquet` files with `object has a slash in name` are created for reproduction purposes?
like:
[...]
in current time we use next structure for store a our iceberg tables
bucket for store some table union by logical principles
2804fdd4-5ce0-4803-9588-5b622abc0ae2.10210173
prefix (folder) process_launch is name of the table
and "/data" is prefix (folder) for sore data files composing this table. by default iceberg table stores data files in location/data/ path and metadata files in location/metadata/
in a reproduction attempt of an object with slashes like `s3://bkt/xxx/8M.dat` the cached file name (`oid`) does not contain slashes:
[...]
i find what my problem true for parquet files. If i upload a random generated file or a tar.gz archive, rgw hashed name before write the file in cache path, also as in your example.
for reproduction problem try get any parquete file, for example from https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Updated by Mark Kogan 11 months ago
in current time we use next structure for store a our iceberg tables
bucket for store some table union by logical principles
2804fdd4-5ce0-4803-9588-5b622abc0ae2.10210173
prefix (folder) process_launch is name of the table
and "/data" is prefix (folder) for sore data files composing this table. by default iceberg table stores data files in location/data/ path and metadata files in location/metadata/
Thank you for detailing,
Is it possible to please ask for s3cmd ls --recursive on a such bucket ?
and a rgw log (debug_rgw = 20) file (like above) but of a whole GET operation please so can observe the complete GET op HTTP headers
Thanks
Updated by Alexander Kazansky 11 months ago
- File cache-logs.txt cache-logs.txt added
hm. sorry, it seems i wrong, i can't repeat problem with s3cmd, but i collected logs with requests from our services where it problem repiteble. may be it help understand what happens.
Updated by Mark Kogan 11 months ago
Thank you very much for the logs, also able to repro something that looks similar with Trino
debugging ...
Updated by Casey Bodley 11 months ago
- Status changed from New to Fix Under Review
- Backport set to pacific quincy reef
Updated by Mark Kogan 10 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot 10 months ago
- Copied to Backport #61766: quincy: RGW D3n cache don't work for object with slash added
Updated by Backport Bot 10 months ago
- Copied to Backport #61767: reef: RGW D3n cache don't work for object with slash added
Updated by Backport Bot 10 months ago
- Copied to Backport #61768: pacific: RGW D3n cache don't work for object with slash added
Updated by Backport Bot 10 months ago
- Tags changed from d3n to d3n backport_processed