Bug #39551
closedmultisite: RGWListBucketIndexesCR for data full sync needs pagination
0%
Description
When a new peer zone is added, rgw starts in 'full sync'. This starts by listing all of that zone's bucket instances, in order to build a list of all of the bucket index shards we need to sync.
from RGWListBucketIndexesCR::operate():
/* FIXME: need a better scaling solution here, requires streaming output */
call(new RGWReadRESTResourceCR<list<string> >(store->ctx(), sync_env->conn, sync_env->http_manager,
entrypoint, NULL, &result));
This attempts to read the entire list of bucket instance ids from the remote zone's "/admin/metadata/bucket.instance" admin api. RGWOp_Metadata_List is the op for this admin api, and it will return a maximum of 1000 entries - so if there are more bucket instances than that, data sync won't see them. RGWOp_Metadata_List accepts a "marker" parameter that allows you to resume a listing from where the last request left off. RGWListBucketIndexesCR should use this marker in order to continue listing keys until the end (where truncated = false).