Project

General

Profile

Actions

Feature #8473

closed

rgw: Shard bucket index objects to improve single bucket PUT throughput

Added by Guang Yang almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
% Done:

20%

Source:
Community (dev)
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

There was a blueprint talking about the bucket index scalability issue - https://wiki.ceph.com/Planning/Sideboard/rgw:_bucket_index_scalability

In order to improve the scalability, there were a couple of options mentioned:
1. Use blink bucket, basically disable bucket indexing. With this, there is not bottleneck for single bucket PUT anymore, however, the bucket listing functionality was lost.
2. Shard bucket objects, by sharding bucket index to X objects, the throughput can be improved to X times (expected).

There was also a conversation talking about this recently - http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/19609

I am prototyping the change, and to begin with, I would like to list two basic options we can approach:
1. Use a radosgw configuration as a static configuration, every bucket is sharded in the same way.
Pros:
1. Keep user transparent, the way how to shard is an implementation detail and should not expose to users (as comparing to option 2).
Cons:
1. The load for each bucket might not be equal, and use a single value for all use case might not be appropriate.
2. Do per bucket configuration (e.g. disable bucket, num or shards, etc), this requires to modify the create bucket API to include a new parameter.
Pros:
1. Fine-grain control the bucket sharding
Cons:
1. User should be aware of the the implementation detail and do the selection at the beginning (if we don't have good scale out strategy).

Please help to review and comment.
Thanks

Actions

Also available in: Atom PDF