Project

General

Profile

Support #57908

rgw common prefix performance on large bucket

Added by Jiayu Sun 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

Hi, I'm facing the same issue metioned here:
https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/36P62BOOCJBVVJCVUX5F5J7KYCGAAICV/

since this is an old blog, I;m wondering if there's any update on rgw. I;m using 16.2.5 for and and facing the same performance issue.

I've read many docs as well as dive into source code for a while. I found when you have a bucket of around 50 folders, and there's much more objects(e.g. 1M) in each of 50 folders, it will uses too much time (~1 min) to list the top level of the bucket.

from some test and source code, I found rgw will do the following work to list the bucket:

1. send the request for querying 1k objects from each shard
2. aggregate the result.
3. if the number of results is less than 1k, update the start key and return 1

for the above iteration, we can only find 1 common prefix. if the root level has 10 folder, we need 50 iteration which will causing lots of data transfer and huge performance degradation.

can we do common prefix finding at step 1? since object list is stored at omap which is rocksdb, maybe we can fetch the first 2 keys from the starting key and skip other keys using a O(logn) algorithm since keys are sorted in rocksdb

Also available in: Atom PDF