Project

General

Profile

Actions

Bug #23651

open

Dynamic bucket indexing, resharding and tenants still seems to be broken

Added by Mark Schouten about 6 years ago. Updated about 5 years ago.

Status:
In Progress
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I've had issues with this before, which is described in https://tracker.ceph.com/issues/22046. But the issues remain (and grow) after upgrading to 12.2.4.

With dynamic resharding enabled, the cluster will start resharding a bucket. This results in a entry in the resharding queue. It seems this queue is executed daily. I am monitoring the number of objects in the pool, and that seems to increase with about 1.5k objects, each time resharding is started (but never finished).

`radosgw-admin reshard status --uid='DB0220$elasticsearch' --tenant=DB0220 --bucket=backups` gives me an (larger than my scrollbuffer) array with the following:

    {
        "reshard_status": 0,
        "new_bucket_instance_id": "",
        "num_shards": -1
    },

I would like to cancel the resharding, but that doesn't work:

root@osdnode03:~# radosgw-admin reshard cancel --uid='DB0220$elasticsearch' --tenant=DB0220 --bucket=backups 
Error in getting bucket backups: (2) No such file or directory
2018-04-11 17:00:43.099372 7fce19021cc0 -1 ERROR: failed to get entry from reshard log, oid=reshard.0000000010 tenant= bucket=backups

I THINK that has to do with https://github.com/oritwas/ceph/blob/0a2142e83b58fa8e238bcb748d1cb97bdba674c5/src/rgw/rgw_admin.cc#L5755

So I have a lot of objects in my index pool that do not make sense, a lot resharding entries I cannot cancel, and almost no access possible to the bucket that needed the resharding. I could use some help :)


Files

graph.png (27.2 KB) graph.png Growing object-count in default.rgw.buckets.index Mark Schouten, 05/07/2018 12:39 PM
Actions #1

Updated by Orit Wasserman almost 6 years ago

  • Assignee set to Orit Wasserman
Actions #2

Updated by Orit Wasserman almost 6 years ago

  • Status changed from New to In Progress
Actions #3

Updated by Orit Wasserman almost 6 years ago

Reshard status 0 means RESHARD_NONE which means there is no resharding going on.
The cancel command would have failed even if you didn't have a tenant configured.
The entries you are seeing are leftover from previous reshardings (one per bucket index shard).
I will add a command to clean up those.

Actions #4

Updated by Mark Schouten almost 6 years ago

Thanks. But those entries are not my main issue. The main issue is that my bucket index pool has 1035305 objects. Which increased with every time resharding was started. I've run a `orphans find` on the default.rgw.buckets.index pool, but I'm not sure if I can clean it up.

Actions #5

Updated by Mark Schouten almost 6 years ago

See the attached graph for what happened to the object-count. Also, see http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-April/025856.html .

Actions #6

Updated by Orit Wasserman almost 6 years ago

Mark Schouten wrote:

See the attached graph for what happened to the object-count. Also, see http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-April/025856.html .

can you get me a list of those objects?
rados -p .au-east.rgw.buckets.index listomapvals <bucket index>

Actions #7

Updated by Orit Wasserman almost 6 years ago

radosgw-admin bi list --bucket <bucket> output
If I remember correctly you add failed resharding attempts, it could be the left over from those.
can you get the list of all the bucket instances?

Actions #8

Updated by Mark Schouten almost 6 years ago

I already deleted the bucket. That didn't shrink the index-objects much though.

How can I provide you with useful data now?

Actions #9

Updated by Beom-Seok Park almost 6 years ago

The problem of increasing object-count occurs when bucket index resharding in the version-enabled bucket.

Test env.
ceph v12.2.5 + https://github.com/ceph/ceph/pull/21669

  1. Normal bucket
    Object-count does not increase after bucket index reshading.
    $ sudo radosgw-admin bucket limit check
    ...
    {
        "bucket": "testbucket01",
        "tenant": "",
        "num_objects": 1,
        "num_shards": 0,
        "objects_per_shard": 1,
        "fill_status": "OK" 
    }
    ...
    
    $ sudo radosgw-admin bucket stats --bucket=testbucket01
    {
        "bucket": "testbucket01",
        "zonegroup": "08c71f71-941f-422c-b24d-1e3553c17b9c",
        "placement_rule": "default-placement",
        "explicit_placement": {
            "data_pool": "",
            "data_extra_pool": "",
            "index_pool": "" 
        },
        "id": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.14",
        "marker": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.14",
        "index_type": "Normal",
        "owner": "9f67fb8460ab483c9aa7f130d76ef81b",
        "ver": "0#3",
        "master_ver": "0#0",
        "mtime": "2018-05-23 16:52:41.160882",
        "max_marker": "0#",
        "usage": {
            "rgw.main": {
                "size": 6,
                "size_actual": 4096,
                "size_utilized": 6,
                "size_kb": 1,
                "size_kb_actual": 4,
                "size_kb_utilized": 1,
                "num_objects": 1
            }
        },
        "bucket_quota": {
            "enabled": false,
            "check_on_raw": false,
            "max_size": -1,
            "max_size_kb": 0,
            "max_objects": -1
        }
    }
    
    $ sudo radosgw-admin bi list --bucket=testbucket01
    [
        {
            "type": "plain",
            "idx": "hello.txt",
            "entry": {
                "name": "hello.txt",
                "instance": "",
                "ver": {
                    "pool": 9,
                    "epoch": 192035
                },
                "locator": "",
                "exists": "true",
                "meta": {
                    "category": 1,
                    "size": 6,
                    "mtime": "2018-05-23 07:52:57.538684Z",
                    "etag": "b1946ac92492d2347c6235b4d2611184",
                    "owner": "9f67fb8460ab483c9aa7f130d76ef81b",
                    "owner_display_name": "beomseok",
                    "content_type": "text/plain",
                    "accounted_size": 6,
                    "user_data": "" 
                },
                "tag": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.98",
                "flags": 0,
                "pending_map": [],
                "versioned_epoch": 0
            }
        }
    ]
    
    $ sudo radosgw-admin reshard add --bucket=testbucket01 --num-shards=2
    $ sudo radosgw-admin reshard process
    $ sudo radosgw-admin bucket limit check
    ...
    {
        "bucket": "testbucket01",
        "tenant": "",
        "num_objects": 1,
        "num_shards": 2,
        "objects_per_shard": 0,
        "fill_status": "OK" 
    }
    ...
    
    $ sudo radosgw-admin bucket stats --bucket=testbucket01
    {
        "bucket": "testbucket01",
        "zonegroup": "08c71f71-941f-422c-b24d-1e3553c17b9c",
        "placement_rule": "default-placement",
        "explicit_placement": {
            "data_pool": "",
            "data_extra_pool": "",
            "index_pool": "" 
        },
        "id": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.4174803.1",
        "marker": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.14",
        "index_type": "Normal",
        "owner": "9f67fb8460ab483c9aa7f130d76ef81b",
        "ver": "0#1,1#2",
        "master_ver": "0#0,1#0",
        "mtime": "2018-05-23 16:54:22.217668",
        "max_marker": "0#,1#",
        "usage": {
            "rgw.main": {
                "size": 6,
                "size_actual": 4096,
                "size_utilized": 0,
                "size_kb": 1,
                "size_kb_actual": 4,
                "size_kb_utilized": 0,
                "num_objects": 1
            }
        },
        "bucket_quota": {
            "enabled": false,
            "check_on_raw": false,
            "max_size": -1,
            "max_size_kb": 0,
            "max_objects": -1
        }
    }
    
    $ sudo radosgw-admin bi list --bucket=testbucket01
    [
        {
            "type": "plain",
            "idx": "hello.txt",
            "entry": {
                "name": "hello.txt",
                "instance": "",
                "ver": {
                    "pool": 9,
                    "epoch": 192035
                },
                "locator": "",
                "exists": "true",
                "meta": {
                    "category": 1,
                    "size": 6,
                    "mtime": "2018-05-23 07:52:57.538684Z",
                    "etag": "b1946ac92492d2347c6235b4d2611184",
                    "owner": "9f67fb8460ab483c9aa7f130d76ef81b",
                    "owner_display_name": "beomseok",
                    "content_type": "text/plain",
                    "accounted_size": 6,
                    "user_data": "" 
                },
                "tag": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.98",
                "flags": 0,
                "pending_map": [],
                "versioned_epoch": 0
            }
        }
    ]
    
  2. Version-enabled bucket
    Object-count increased after bucket index reshading.
    $ sudo radosgw-admin bucket limit check
    ...
    {
        "bucket": "testbucket02",
        "tenant": "",
        "num_objects": 1,
        "num_shards": 0,
        "objects_per_shard": 1,
        "fill_status": "OK" 
    }
    ...
    
    $ sudo radosgw-admin bucket stats --bucket=testbucket02
    {
        "bucket": "testbucket02",
        "zonegroup": "08c71f71-941f-422c-b24d-1e3553c17b9c",
        "placement_rule": "default-placement",
        "explicit_placement": {
            "data_pool": "",
            "data_extra_pool": "",
            "index_pool": "" 
        },
        "id": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.15",
        "marker": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.15",
        "index_type": "Normal",
        "owner": "9f67fb8460ab483c9aa7f130d76ef81b",
        "ver": "0#4",
        "master_ver": "0#0",
        "mtime": "2018-05-23 16:58:03.575472",
        "max_marker": "0#",
        "usage": {
            "rgw.main": {
                "size": 6,
                "size_actual": 4096,
                "size_utilized": 6,
                "size_kb": 1,
                "size_kb_actual": 4,
                "size_kb_utilized": 1,
                "num_objects": 1
            }
        },
        "bucket_quota": {
            "enabled": false,
            "check_on_raw": false,
            "max_size": -1,
            "max_size_kb": 0,
            "max_objects": -1
        }
    }
    
    $ sudo radosgw-admin bi list --bucket=testbucket02
    [
        {
            "type": "plain",
            "idx": "hello.txt",
            "entry": {
                "name": "hello.txt",
                "instance": "",
                "ver": {
                    "pool": -1,
                    "epoch": 0
                },
                "locator": "",
                "exists": "false",
                "meta": {
                    "category": 0,
                    "size": 0,
                    "mtime": "0.000000",
                    "etag": "",
                    "owner": "",
                    "owner_display_name": "",
                    "content_type": "",
                    "accounted_size": 0,
                    "user_data": "" 
                },
                "tag": "",
                "flags": 8,
                "pending_map": [],
                "versioned_epoch": 0
            }
        },
        {
            "type": "plain",
            "idx": "hello.txt\u0000v913\u0000i-FenPaih0WWd-bwxnegsqlC6j4cyM3S",
            "entry": {
                "name": "hello.txt",
                "instance": "-FenPaih0WWd-bwxnegsqlC6j4cyM3S",
                "ver": {
                    "pool": 9,
                    "epoch": 192795
                },
                "locator": "",
                "exists": "true",
                "meta": {
                    "category": 1,
                    "size": 6,
                    "mtime": "2018-05-23 07:59:11.629294Z",
                    "etag": "b1946ac92492d2347c6235b4d2611184",
                    "owner": "9f67fb8460ab483c9aa7f130d76ef81b",
                    "owner_display_name": "beomseok",
                    "content_type": "text/plain",
                    "accounted_size": 6,
                    "user_data": "" 
                },
                "tag": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.154",
                "flags": 3,
                "pending_map": [],
                "versioned_epoch": 2
            }
        },
        {
            "type": "instance",
            "idx": "�1000_hello.txt\u0000i-FenPaih0WWd-bwxnegsqlC6j4cyM3S",
            "entry": {
                "name": "hello.txt",
                "instance": "-FenPaih0WWd-bwxnegsqlC6j4cyM3S",
                "ver": {
                    "pool": 9,
                    "epoch": 192795
                },
                "locator": "",
                "exists": "true",
                "meta": {
                    "category": 1,
                    "size": 6,
                    "mtime": "2018-05-23 07:59:11.629294Z",
                    "etag": "b1946ac92492d2347c6235b4d2611184",
                    "owner": "9f67fb8460ab483c9aa7f130d76ef81b",
                    "owner_display_name": "beomseok",
                    "content_type": "text/plain",
                    "accounted_size": 6,
                    "user_data": "" 
                },
                "tag": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.154",
                "flags": 3,
                "pending_map": [],
                "versioned_epoch": 2
            }
        },
        {
            "type": "olh",
            "idx": "�1001_hello.txt",
            "entry": {
                "key": {
                    "name": "hello.txt",
                    "instance": "-FenPaih0WWd-bwxnegsqlC6j4cyM3S" 
                },
                "delete_marker": "false",
                "epoch": 2,
                "pending_log": [],
                "tag": "3km67yp4c9azrxp8wevy0ckq003v1hx2",
                "exists": "true",
                "pending_removal": "false" 
            }
        }
    ]
    
    $ sudo radosgw-admin reshard add --bucket=testbucket02 --num-shards=2
    $ sudo radosgw-admin reshard process
    $ sudo radosgw-admin bucket limit check
    ...
    {
        "bucket": "testbucket02",
        "tenant": "",
        "num_objects": 3,
        "num_shards": 2,
        "objects_per_shard": 1,
        "fill_status": "OK" 
    }
    ...
    
    $ sudo radosgw-admin bucket stats --bucket=testbucket02
    {
        "bucket": "testbucket02",
        "zonegroup": "08c71f71-941f-422c-b24d-1e3553c17b9c",
        "placement_rule": "default-placement",
        "explicit_placement": {
            "data_pool": "",
            "data_extra_pool": "",
            "index_pool": "" 
        },
        "id": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.4175058.1",
        "marker": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.15",
        "index_type": "Normal",
        "owner": "9f67fb8460ab483c9aa7f130d76ef81b",
        "ver": "0#1,1#2",
        "master_ver": "0#0,1#0",
        "mtime": "2018-05-23 17:00:08.571483",
        "max_marker": "0#,1#",
        "usage": {
            "rgw.none": {
                "size": 0,
                "size_actual": 0,
                "size_utilized": 0,
                "size_kb": 0,
                "size_kb_actual": 0,
                "size_kb_utilized": 0,
                "num_objects": 1
            },
            "rgw.main": {
                "size": 12,
                "size_actual": 8192,
                "size_utilized": 0,
                "size_kb": 1,
                "size_kb_actual": 8,
                "size_kb_utilized": 0,
                "num_objects": 2
            }
        },
        "bucket_quota": {
            "enabled": false,
            "check_on_raw": false,
            "max_size": -1,
            "max_size_kb": 0,
            "max_objects": -1
        }
    }
    
    $ sudo radosgw-admin bi list --bucket=testbucket02
    [
        {
            "type": "plain",
            "idx": "hello.txt",
            "entry": {
                "name": "hello.txt",
                "instance": "",
                "ver": {
                    "pool": -1,
                    "epoch": 0
                },
                "locator": "",
                "exists": "false",
                "meta": {
                    "category": 0,
                    "size": 0,
                    "mtime": "0.000000",
                    "etag": "",
                    "owner": "",
                    "owner_display_name": "",
                    "content_type": "",
                    "accounted_size": 0,
                    "user_data": "" 
                },
                "tag": "",
                "flags": 8,
                "pending_map": [],
                "versioned_epoch": 0
            }
        },
        {
            "type": "plain",
            "idx": "hello.txt\u0000v913\u0000i-FenPaih0WWd-bwxnegsqlC6j4cyM3S",
            "entry": {
                "name": "hello.txt",
                "instance": "-FenPaih0WWd-bwxnegsqlC6j4cyM3S",
                "ver": {
                    "pool": 9,
                    "epoch": 192795
                },
                "locator": "",
                "exists": "true",
                "meta": {
                    "category": 1,
                    "size": 6,
                    "mtime": "2018-05-23 07:59:11.629294Z",
                    "etag": "b1946ac92492d2347c6235b4d2611184",
                    "owner": "9f67fb8460ab483c9aa7f130d76ef81b",
                    "owner_display_name": "beomseok",
                    "content_type": "text/plain",
                    "accounted_size": 6,
                    "user_data": "" 
                },
                "tag": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.154",
                "flags": 3,
                "pending_map": [],
                "versioned_epoch": 2
            }
        },
        {
            "type": "instance",
            "idx": "�1000_hello.txt\u0000i-FenPaih0WWd-bwxnegsqlC6j4cyM3S",
            "entry": {
                "name": "hello.txt",
                "instance": "-FenPaih0WWd-bwxnegsqlC6j4cyM3S",
                "ver": {
                    "pool": 9,
                    "epoch": 192795
                },
                "locator": "",
                "exists": "true",
                "meta": {
                    "category": 1,
                    "size": 6,
                    "mtime": "2018-05-23 07:59:11.629294Z",
                    "etag": "b1946ac92492d2347c6235b4d2611184",
                    "owner": "9f67fb8460ab483c9aa7f130d76ef81b",
                    "owner_display_name": "beomseok",
                    "content_type": "text/plain",
                    "accounted_size": 6,
                    "user_data": "" 
                },
                "tag": "f9e6f03b-86a7-48a5-8932-5cc1ea17270a.3994421.154",
                "flags": 3,
                "pending_map": [],
                "versioned_epoch": 2
            }
        },
        {
            "type": "olh",
            "idx": "�1001_hello.txt",
            "entry": {
                "key": {
                    "name": "hello.txt",
                    "instance": "-FenPaih0WWd-bwxnegsqlC6j4cyM3S" 
                },
                "delete_marker": "false",
                "epoch": 2,
                "pending_log": [],
                "tag": "3km67yp4c9azrxp8wevy0ckq003v1hx2",
                "exists": "true",
                "pending_removal": "false" 
            }
        }
    ]
    
Actions #10

Updated by Mark Schouten almost 6 years ago

Can someone tell me how to clean up the index? I have way too many objects now..

Actions #11

Updated by Rafal Wadolowski over 5 years ago

Mark Schouten wrote:

Can someone tell me how to clean up the index? I have way too many objects now..

We created script which deletes all bi's and metadata of bi's. It selects bucket indexes, which doesn't exist in bucket stats (marker and id <- they are important). After all we have about 1k objects in index pool. Before there was about 240k. Dynamic reshard cause that, so we turn it off. Now we are facing problems with entries in log (namespace=reshard). Maybe there is a secure way to purge them?

Actions #12

Updated by Mark Schouten about 5 years ago

Actions

Also available in: Atom PDF