Bug #23651
openDynamic bucket indexing, resharding and tenants still seems to be broken
0%
Description
I've had issues with this before, which is described in https://tracker.ceph.com/issues/22046. But the issues remain (and grow) after upgrading to 12.2.4.
With dynamic resharding enabled, the cluster will start resharding a bucket. This results in a entry in the resharding queue. It seems this queue is executed daily. I am monitoring the number of objects in the pool, and that seems to increase with about 1.5k objects, each time resharding is started (but never finished).
`radosgw-admin reshard status --uid='DB0220$elasticsearch' --tenant=DB0220 --bucket=backups` gives me an (larger than my scrollbuffer) array with the following:
{ "reshard_status": 0, "new_bucket_instance_id": "", "num_shards": -1 },
I would like to cancel the resharding, but that doesn't work:
root@osdnode03:~# radosgw-admin reshard cancel --uid='DB0220$elasticsearch' --tenant=DB0220 --bucket=backups Error in getting bucket backups: (2) No such file or directory 2018-04-11 17:00:43.099372 7fce19021cc0 -1 ERROR: failed to get entry from reshard log, oid=reshard.0000000010 tenant= bucket=backups
I THINK that has to do with https://github.com/oritwas/ceph/blob/0a2142e83b58fa8e238bcb748d1cb97bdba674c5/src/rgw/rgw_admin.cc#L5755
So I have a lot of objects in my index pool that do not make sense, a lot resharding entries I cannot cancel, and almost no access possible to the bucket that needed the resharding. I could use some help :)
Files
Updated by Orit Wasserman almost 6 years ago
- Status changed from New to In Progress
Updated by Orit Wasserman almost 6 years ago
Reshard status 0 means RESHARD_NONE which means there is no resharding going on.
The cancel command would have failed even if you didn't have a tenant configured.
The entries you are seeing are leftover from previous reshardings (one per bucket index shard).
I will add a command to clean up those.
Updated by Mark Schouten almost 6 years ago
Thanks. But those entries are not my main issue. The main issue is that my bucket index pool has 1035305 objects. Which increased with every time resharding was started. I've run a `orphans find` on the default.rgw.buckets.index pool, but I'm not sure if I can clean it up.
Updated by Mark Schouten almost 6 years ago
See the attached graph for what happened to the object-count. Also, see http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-April/025856.html .
Updated by Orit Wasserman almost 6 years ago
Mark Schouten wrote:
See the attached graph for what happened to the object-count. Also, see http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-April/025856.html .
can you get me a list of those objects?
rados -p .au-east.rgw.buckets.index listomapvals <bucket index>
Updated by Orit Wasserman almost 6 years ago
radosgw-admin bi list --bucket <bucket> output
If I remember correctly you add failed resharding attempts, it could be the left over from those.
can you get the list of all the bucket instances?
Updated by Mark Schouten almost 6 years ago
I already deleted the bucket. That didn't shrink the index-objects much though.
How can I provide you with useful data now?
Updated by Beom-Seok Park almost 6 years ago
The problem of increasing object-count occurs when bucket index resharding in the version-enabled bucket.
Test env.
ceph v12.2.5 + https://github.com/ceph/ceph/pull/21669
- Normal bucket
Object-count does not increase after bucket index reshading.$ sudo radosgw-admin bucket limit check ... { "bucket": "testbucket01", "tenant": "", "num_objects": 1, "num_shards": 0, "objects_per_shard": 1, "fill_status": "OK" } ... $ sudo radosgw-admin bucket stats --bucket=testbucket01 { "bucket": "testbucket01", "zonegroup": "08c71f71-941f-422c-b24d-1e3553c17b9c", "placement_rule": "default-placement", "explicit_placement": { "data_pool": "", "data_extra_pool": "", "index_pool": "" }, "id": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.14", "marker": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.14", "index_type": "Normal", "owner": "9f67fb8460ab483c9aa7f130d76ef81b", "ver": "0#3", "master_ver": "0#0", "mtime": "2018-05-23 16:52:41.160882", "max_marker": "0#", "usage": { "rgw.main": { "size": 6, "size_actual": 4096, "size_utilized": 6, "size_kb": 1, "size_kb_actual": 4, "size_kb_utilized": 1, "num_objects": 1 } }, "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 } } $ sudo radosgw-admin bi list --bucket=testbucket01 [ { "type": "plain", "idx": "hello.txt", "entry": { "name": "hello.txt", "instance": "", "ver": { "pool": 9, "epoch": 192035 }, "locator": "", "exists": "true", "meta": { "category": 1, "size": 6, "mtime": "2018-05-23 07:52:57.538684Z", "etag": "b1946ac92492d2347c6235b4d2611184", "owner": "9f67fb8460ab483c9aa7f130d76ef81b", "owner_display_name": "beomseok", "content_type": "text/plain", "accounted_size": 6, "user_data": "" }, "tag": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.98", "flags": 0, "pending_map": [], "versioned_epoch": 0 } } ] $ sudo radosgw-admin reshard add --bucket=testbucket01 --num-shards=2 $ sudo radosgw-admin reshard process $ sudo radosgw-admin bucket limit check ... { "bucket": "testbucket01", "tenant": "", "num_objects": 1, "num_shards": 2, "objects_per_shard": 0, "fill_status": "OK" } ... $ sudo radosgw-admin bucket stats --bucket=testbucket01 { "bucket": "testbucket01", "zonegroup": "08c71f71-941f-422c-b24d-1e3553c17b9c", "placement_rule": "default-placement", "explicit_placement": { "data_pool": "", "data_extra_pool": "", "index_pool": "" }, "id": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.4174803.1", "marker": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.14", "index_type": "Normal", "owner": "9f67fb8460ab483c9aa7f130d76ef81b", "ver": "0#1,1#2", "master_ver": "0#0,1#0", "mtime": "2018-05-23 16:54:22.217668", "max_marker": "0#,1#", "usage": { "rgw.main": { "size": 6, "size_actual": 4096, "size_utilized": 0, "size_kb": 1, "size_kb_actual": 4, "size_kb_utilized": 0, "num_objects": 1 } }, "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 } } $ sudo radosgw-admin bi list --bucket=testbucket01 [ { "type": "plain", "idx": "hello.txt", "entry": { "name": "hello.txt", "instance": "", "ver": { "pool": 9, "epoch": 192035 }, "locator": "", "exists": "true", "meta": { "category": 1, "size": 6, "mtime": "2018-05-23 07:52:57.538684Z", "etag": "b1946ac92492d2347c6235b4d2611184", "owner": "9f67fb8460ab483c9aa7f130d76ef81b", "owner_display_name": "beomseok", "content_type": "text/plain", "accounted_size": 6, "user_data": "" }, "tag": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.98", "flags": 0, "pending_map": [], "versioned_epoch": 0 } } ]
- Version-enabled bucket
Object-count increased after bucket index reshading.$ sudo radosgw-admin bucket limit check ... { "bucket": "testbucket02", "tenant": "", "num_objects": 1, "num_shards": 0, "objects_per_shard": 1, "fill_status": "OK" } ... $ sudo radosgw-admin bucket stats --bucket=testbucket02 { "bucket": "testbucket02", "zonegroup": "08c71f71-941f-422c-b24d-1e3553c17b9c", "placement_rule": "default-placement", "explicit_placement": { "data_pool": "", "data_extra_pool": "", "index_pool": "" }, "id": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.15", "marker": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.15", "index_type": "Normal", "owner": "9f67fb8460ab483c9aa7f130d76ef81b", "ver": "0#4", "master_ver": "0#0", "mtime": "2018-05-23 16:58:03.575472", "max_marker": "0#", "usage": { "rgw.main": { "size": 6, "size_actual": 4096, "size_utilized": 6, "size_kb": 1, "size_kb_actual": 4, "size_kb_utilized": 1, "num_objects": 1 } }, "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 } } $ sudo radosgw-admin bi list --bucket=testbucket02 [ { "type": "plain", "idx": "hello.txt", "entry": { "name": "hello.txt", "instance": "", "ver": { "pool": -1, "epoch": 0 }, "locator": "", "exists": "false", "meta": { "category": 0, "size": 0, "mtime": "0.000000", "etag": "", "owner": "", "owner_display_name": "", "content_type": "", "accounted_size": 0, "user_data": "" }, "tag": "", "flags": 8, "pending_map": [], "versioned_epoch": 0 } }, { "type": "plain", "idx": "hello.txt\u0000v913\u0000i-FenPaih0WWd-bwxnegsqlC6j4cyM3S", "entry": { "name": "hello.txt", "instance": "-FenPaih0WWd-bwxnegsqlC6j4cyM3S", "ver": { "pool": 9, "epoch": 192795 }, "locator": "", "exists": "true", "meta": { "category": 1, "size": 6, "mtime": "2018-05-23 07:59:11.629294Z", "etag": "b1946ac92492d2347c6235b4d2611184", "owner": "9f67fb8460ab483c9aa7f130d76ef81b", "owner_display_name": "beomseok", "content_type": "text/plain", "accounted_size": 6, "user_data": "" }, "tag": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.154", "flags": 3, "pending_map": [], "versioned_epoch": 2 } }, { "type": "instance", "idx": "�1000_hello.txt\u0000i-FenPaih0WWd-bwxnegsqlC6j4cyM3S", "entry": { "name": "hello.txt", "instance": "-FenPaih0WWd-bwxnegsqlC6j4cyM3S", "ver": { "pool": 9, "epoch": 192795 }, "locator": "", "exists": "true", "meta": { "category": 1, "size": 6, "mtime": "2018-05-23 07:59:11.629294Z", "etag": "b1946ac92492d2347c6235b4d2611184", "owner": "9f67fb8460ab483c9aa7f130d76ef81b", "owner_display_name": "beomseok", "content_type": "text/plain", "accounted_size": 6, "user_data": "" }, "tag": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.154", "flags": 3, "pending_map": [], "versioned_epoch": 2 } }, { "type": "olh", "idx": "�1001_hello.txt", "entry": { "key": { "name": "hello.txt", "instance": "-FenPaih0WWd-bwxnegsqlC6j4cyM3S" }, "delete_marker": "false", "epoch": 2, "pending_log": [], "tag": "3km67yp4c9azrxp8wevy0ckq003v1hx2", "exists": "true", "pending_removal": "false" } } ] $ sudo radosgw-admin reshard add --bucket=testbucket02 --num-shards=2 $ sudo radosgw-admin reshard process $ sudo radosgw-admin bucket limit check ... { "bucket": "testbucket02", "tenant": "", "num_objects": 3, "num_shards": 2, "objects_per_shard": 1, "fill_status": "OK" } ... $ sudo radosgw-admin bucket stats --bucket=testbucket02 { "bucket": "testbucket02", "zonegroup": "08c71f71-941f-422c-b24d-1e3553c17b9c", "placement_rule": "default-placement", "explicit_placement": { "data_pool": "", "data_extra_pool": "", "index_pool": "" }, "id": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.4175058.1", "marker": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.15", "index_type": "Normal", "owner": "9f67fb8460ab483c9aa7f130d76ef81b", "ver": "0#1,1#2", "master_ver": "0#0,1#0", "mtime": "2018-05-23 17:00:08.571483", "max_marker": "0#,1#", "usage": { "rgw.none": { "size": 0, "size_actual": 0, "size_utilized": 0, "size_kb": 0, "size_kb_actual": 0, "size_kb_utilized": 0, "num_objects": 1 }, "rgw.main": { "size": 12, "size_actual": 8192, "size_utilized": 0, "size_kb": 1, "size_kb_actual": 8, "size_kb_utilized": 0, "num_objects": 2 } }, "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 } } $ sudo radosgw-admin bi list --bucket=testbucket02 [ { "type": "plain", "idx": "hello.txt", "entry": { "name": "hello.txt", "instance": "", "ver": { "pool": -1, "epoch": 0 }, "locator": "", "exists": "false", "meta": { "category": 0, "size": 0, "mtime": "0.000000", "etag": "", "owner": "", "owner_display_name": "", "content_type": "", "accounted_size": 0, "user_data": "" }, "tag": "", "flags": 8, "pending_map": [], "versioned_epoch": 0 } }, { "type": "plain", "idx": "hello.txt\u0000v913\u0000i-FenPaih0WWd-bwxnegsqlC6j4cyM3S", "entry": { "name": "hello.txt", "instance": "-FenPaih0WWd-bwxnegsqlC6j4cyM3S", "ver": { "pool": 9, "epoch": 192795 }, "locator": "", "exists": "true", "meta": { "category": 1, "size": 6, "mtime": "2018-05-23 07:59:11.629294Z", "etag": "b1946ac92492d2347c6235b4d2611184", "owner": "9f67fb8460ab483c9aa7f130d76ef81b", "owner_display_name": "beomseok", "content_type": "text/plain", "accounted_size": 6, "user_data": "" }, "tag": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.154", "flags": 3, "pending_map": [], "versioned_epoch": 2 } }, { "type": "instance", "idx": "�1000_hello.txt\u0000i-FenPaih0WWd-bwxnegsqlC6j4cyM3S", "entry": { "name": "hello.txt", "instance": "-FenPaih0WWd-bwxnegsqlC6j4cyM3S", "ver": { "pool": 9, "epoch": 192795 }, "locator": "", "exists": "true", "meta": { "category": 1, "size": 6, "mtime": "2018-05-23 07:59:11.629294Z", "etag": "b1946ac92492d2347c6235b4d2611184", "owner": "9f67fb8460ab483c9aa7f130d76ef81b", "owner_display_name": "beomseok", "content_type": "text/plain", "accounted_size": 6, "user_data": "" }, "tag": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.154", "flags": 3, "pending_map": [], "versioned_epoch": 2 } }, { "type": "olh", "idx": "�1001_hello.txt", "entry": { "key": { "name": "hello.txt", "instance": "-FenPaih0WWd-bwxnegsqlC6j4cyM3S" }, "delete_marker": "false", "epoch": 2, "pending_log": [], "tag": "3km67yp4c9azrxp8wevy0ckq003v1hx2", "exists": "true", "pending_removal": "false" } } ]
Updated by Mark Schouten almost 6 years ago
Can someone tell me how to clean up the index? I have way too many objects now..
Updated by Rafal Wadolowski over 5 years ago
Mark Schouten wrote:
Can someone tell me how to clean up the index? I have way too many objects now..
We created script which deletes all bi's and metadata of bi's. It selects bucket indexes, which doesn't exist in bucket stats (marker and id <- they are important). After all we have about 1k objects in index pool. Before there was about 240k. Dynamic reshard cause that, so we turn it off. Now we are facing problems with entries in log (namespace=reshard). Maybe there is a secure way to purge them?
Updated by Mark Schouten about 5 years ago
I believe this issue is fixed in https://ceph.com/releases/v12-2-11-luminous-released/ ?