Bug #39485: Luminous: a huge bucket stuck in dynamic resharding for one week - rgw - Ceph

Actions

Copy link

Bug #39485

open

Luminous: a huge bucket stuck in dynamic resharding for one week

Added by Rui Xu almost 5 years ago. Updated almost 5 years ago.

Status:

Need More Info

Priority:

Normal

Assignee:

J. Eric Ivancich

Target version:

% Done:

Source:

Community (user)

Tags:

rgw

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v12.2.5

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Ceph version Luminous 12.2.5
It has 3 monitors, 3 rgw gateway, 436 bluestore osds, and nvme disk osd are 24, ssd disk osd are 72, sata disk osd are 340
We use ceph in rgw with s3. Thare are some huge buckets in this cluster, the lagest bucket has 470 million objects in it.

The problem is that a bucket has 100 million objects stuck in dynamic resharding, and it progress 1024 to 2048. It has hang for 7 days. Dynamic resharding in this cluster does not need so much time. But this time it seems to endlesss.
The resharding bucket can only read, can't write any more. When you are putting any files, rgw gateway logs "NOTICE: reshard still in progress, retrying"

radosgw-admin command check sharding is : radosgw-admin bucket limit check command output is that it already has 2048 shards and fill status is ok; but radosgw-admin reshard list output is that sharding progcess is in process

output are:

[root@ceph29 ceph_rgw_debug]# radosgw-admin bucket stats --bucket mpilot-data-s3 | head -n 11
{
    "bucket": "mpilot-data-s3",
    "zonegroup": "b8176099-351f-4a8a-a8aa-24a2623ead53",
    "placement_rule": "default-placement",
    "explicit_placement": {
        "data_pool": "",
        "data_extra_pool": "",
        "index_pool": "" 
    },
    "id": "0089274c-7a8b-4e66-83dd-d45e638415d7.52478916.4",
    "marker": "0089274c-7a8b-4e66-83dd-d45e638415d7.21212512.1",
[root@ceph29 ceph_rgw_debug]# radosgw-admin reshard list
[
    {
        "time": "2019-04-18 07:20:08.878365Z",
        "tenant": "",
        "bucket_name": "mpilot-data-s3",
        "bucket_id": "0089274c-7a8b-4e66-83dd-d45e638415d7.46222245.1",
        "new_instance_id": "mpilot-data-s3:0089274c-7a8b-4e66-83dd-d45e638415d7.80468580.2",
        "old_num_shards": 1024,
        "new_num_shards": 2048
    }
]
[root@ceph29 ceph_rgw_debug]# radosgw-admin bucket limit check --uid mpilot-admin
[
    {
        "user_id": "mpilot-admin",
        "buckets": [
            {
                "bucket": "mpilot-data-s3",
                "tenant": "",
                "num_objects": 102940439,
                "num_shards": 2048,
                "objects_per_shard": 50263,
                "fill_status": "OK" 
            }
        ]
    }
]

rgw gateway debug_rgw 30 log is at attachment
Summary log is

2019-04-19 03:36:22.894125 7f7a78607700  1 ====== starting new request req=0x7f7a78601190 =====
2019-04-19 03:36:22.895619 7f7ab7685700  0 block_while_resharding ERROR: bucket is still resharding, please retry
2019-04-19 03:36:22.895753 7f7ab7685700  0 WARNING: set_req_state_err err_no=2300 resorting to 500
2019-04-19 03:36:22.895847 7f7ab7685700  0 ERROR: RESTFUL_IO(s)->complete_header() returned err=Input/output error
2019-04-19 03:36:22.895910 7f7ab7685700  1 ====== req done req=0x7f7ab767f190 op status=-2300 http_status=500 ======
2019-04-19 03:36:22.895974 7f7ab7685700  1 civetweb: 0x5572b7a60000: 172.16.10.221 - - [19/Apr/2019:03:28:52 +0800] "PUT /mpilot-data-s3/itg_test/190417T115433_mkz1-a_mkzX2-c/raw_images/1555485845_fisheye_cam1/1555485847.753603.s1.jpg HTTP/1.1" 500 0 - aws-sdk-go/1.15.56 (go1.10.3; linux; amd64) S3Manager

Files

ceph-client.rgw.radosgw-ceph6.log (455 KB) ceph-client.rgw.radosgw-ceph6.log

rgw resharding

Rui Xu, 04/25/2019 12:33 PM

Actions

Copy link

Updated by Casey Bodley almost 5 years ago

Assignee set to J. Eric Ivancich

Actions

Copy link

Updated by Rui Xu almost 5 years ago

This cluster we have update release , version updated from 12.2.2 to 12.2.5

Actions

Copy link

Updated by J. Eric Ivancich almost 5 years ago

There have been similar cases in the past where a manual resharding has resolved the issue. Have you tried that?

Actions

Copy link

Updated by Rui Xu almost 5 years ago

Eric Ivancich wrote:

There have been similar cases in the past where a manual resharding has resolved the issue. Have you tried that?

Thanks a lot for Eric

There 3 questions for me:
1. In this situation, the bucket stuck in dynamic resharding right now, I want to know what cloud I to do to this bucket except for manual cancel sharding with command "radosgw-admin reshard cancel --bucket <bucket_name>".
2. How to check bucket is in resharding process or in resharding scheduler?
3. There will be some badly impaction if I manual cancel bucket resharding?

Actions

Copy link

Updated by Rui Xu almost 5 years ago

I have done some operations to this bucket, as follows:
1. I canceld the bucket dynamic resharding, then client cloud write object to this bucket
2. After about 4 hours, the bucket started to dynamic resharding again and the number of index from 2048 to 2048
3. About 1 hour, the progress of dynamic resharding had been completed and bucket changed to be normal
4. Arter about 6 hours, the bucket started to dynamic resharding at 2nd time and the number of index from 2048 to 2076
5. About 1 hour, the progress of dynamic resharding had been completed and bucket changed to be normal
6. Then I turned off all of radosgw gateway instances rgw_dynamic_resharding and the bucket has been not resharding until now

Now I don't understand about sharding of bucket, and I would to dynamicly reshard or manual reshard to the bucket?

Actions

Copy link

Updated by J. Eric Ivancich almost 5 years ago

The configuration option rgw_max_objs_per_shard determines when dynamic resharding starts. The default value is 100,000. So when the bucket index shards have more than 100,000 objects in them, dynamic resharding is triggered. Since dynamic resharding was triggered when you had 2,048 shards, I assume the bucket had more than 204,800,000 objects.

It also sounds like the number of objects grew by perhaps 100,000,000 in a few weeks. Resharding will be triggered pretty regularly at that pace.

You could do all your resharding manually, and you could do so with future growth in mind. For example, if you expect to add 50,000,000 objects in the next few weeks bringing total to, say, 260,000,000 objects, and you want room to grow, so maybe 50,000 objects per shard, you could then reshard the bucket to, say, 5,200 shards.

Or you could let dynamic resharding handle it.

But please keep in mind the more shards you have, the longer bucket listing takes (because by default, bucket listings produce object names in lexical order). Unordered bucket listings (a feature in recent versions including 12.2.5) perform much better.

It's unclear why resharding hung initially. More recent versions of ceph have improved the performance of bucket resharding. But unless you can reliably reproduce the issue, I'm inclined to close this issue as "Can't reproduce".

Any other questions or thoughts?

Actions

Copy link

Updated by Casey Bodley almost 5 years ago

Status changed from New to Need More Info

Actions

Copy link

Updated by Thomas Kriechbaumer almost 5 years ago

I am stuck with a very similar problem.
We recently upgraded to nautilus (two days ago), and today we noticed that all read/list/write operations where failing to one bucket - because it is stuck in resharding from 256 to 512 shards.

Log entry:

RGWReshardLock::lock failed to acquire lock on reshard.0000000003 ret=-16

The cluster was original mimic on Ubuntu16.04, then Ubuntu18.04, and now nautilus on Ubuntu18.04. All created with ceph-deploy. Started with 3 nodes, then grew to 5 and now 8. 5 mons - all active. Now we have about 200 OSDs - all healthy.
The cluster is technically in HEALTH_WARN, but only because nautilus now complains about missed scrubbing tasks). MDS and cephfs is also fine. Other buckets, than the one reported below, still work.

radosgw-admin reshard list
[
    {
        "time": "2019-05-29 18:27:39.012365Z",
        "tenant": "",
        "bucket_name": "prelabeling",
        "bucket_id": "4529fe07-c707-4c2d-9ccb-1ecf8154aa23.32005164.1",
        "new_instance_id": "",
        "old_num_shards": 256,
        "new_num_shards": 512
    }
]

radosgw-admin reshard status --bucket prelabeling
[
    {
        "reshard_status": "CLS_RGW_RESHARD_IN_PROGRESS",
        "new_bucket_instance_id": "4529fe07-c707-4c2d-9ccb-1ecf8154aa23.34570350.1",
        "num_shards": 512
    },
... this json object repeats exactly 256 times with the exact same content ...
]

radosgw-admin bucket stats --bucket prelabeling {                                                                                                                                                                                     "bucket": "prelabeling",                                                                                                                                                          "tenant": "",                                                                                                                                                                     "zonegroup": "28036fb5-da9c-4727-a1ec-0f59f846649d",                                                                                                                              "placement_rule": "default-placement",
    "explicit_placement": {
        "data_pool": "",
        "data_extra_pool": "",
        "index_pool": "" 
    },
    "id": "4529fe07-c707-4c2d-9ccb-1ecf8154aa23.32005164.1",
    "marker": "4529fe07-c707-4c2d-9ccb-1ecf8154aa23.774162.1",
    "index_type": "Normal",
    "owner": "foobar",
    "ver": "0#16279,1#16543,2#16448,3#16325,4#16508,5#16383,6#16438,7#16357,8#16394,9#16373,10#16544,11#16467,12#16481,13#16500,14#16326,15#16425,16#16299,17#16464,18#16624,19#16
714,20#16392,21#16772,22#16326,23#16360,24#16376,25#16531,26#16062,27#16210,28#16586,29#16323,30#16388,31#16376,32#16550,33#16378,34#16272,35#16424,36#16417,37#16535,38#16699,39#
16339,40#16484,41#16342,42#16454,43#16197,44#16548,45#16545,46#16199,47#16251,48#16310,49#16408,50#16487,51#16486,52#16564,53#16367,54#16219,55#16353,56#16412,57#16624,58#16303,5
9#16395,60#16375,61#16269,62#16353,63#16254,64#16467,65#16292,66#16622,67#16292,68#16481,69#16291,70#16644,71#16579,72#16557,73#16153,74#16454,75#16582,76#16618,77#16183,78#16558
,79#16163,80#16248,81#16339,82#16381,83#16505,84#16690,85#16438,86#16675,87#16340,88#16401,89#16393,90#16489,91#16623,92#16588,93#16413,94#16233,95#16557,96#16438,97#16428,98#164
44,99#16344,100#16595,101#16279,102#16634,103#16336,104#16572,105#16419,106#16389,107#16498,108#16556,109#16127,110#16496,111#16116,112#16222,113#16327,114#16311,115#16623,116#16
485,117#16280,118#16536,119#16403,120#16247,121#16362,122#16357,123#16427,124#16264,125#16452,126#16370,127#16420,128#16524,129#16307,130#16531,131#16209,132#16346,133#16645,134#
16360,135#16498,136#16566,137#16401,138#16252,139#16224,140#16649,141#16246,142#16372,143#16246,144#16552,145#16373,146#16342,147#16512,148#16107,149#16379,150#16435,151#16329,15
2#16356,153#16423,154#16603,155#16501,156#16577,157#16377,158#16415,159#16378,160#16303,161#16606,162#16295,163#16385,164#16345,165#16544,166#16262,167#16397,168#16601,169#16291,
170#16407,171#16408,172#16553,173#16387,174#16369,175#16419,176#16095,177#16553,178#16256,179#16284,180#16203,181#16354,182#16464,183#16355,184#16459,185#16324,186#16428,187#1621
6,188#16666,189#16546,190#16299,191#16355,192#16463,193#16611,194#16253,195#16416,196#16315,197#15741,198#15874,199#15926,200#15659,201#15820,202#16088,203#15830,204#16030,205#15
950,206#15742,207#15424,208#16057,209#15942,210#15866,211#15921,212#15898,213#15858,214#15816,215#16245,216#16019,217#15488,218#16172,219#15859,220#15840,221#15832,222#15812,223#
15770,224#15956,225#15818,226#15916,227#16088,228#15799,229#15878,230#15613,231#15863,232#15870,233#15698,234#15848,235#15862,236#15936,237#15756,238#16002,239#16023,240#15907,24
1#15625,242#15966,243#16136,244#16073,245#15843,246#16007,247#16160,248#15807,249#15973,250#15976,251#15884,252#15848,253#16040,254#15861,255#15972",
    "master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0,16#0,17#0,18#0,19#0,20#0,21#0,22#0,23#0,24#0,25#0,26#0,27#0,28#0,29#0,30#0,31#0,32#0,33#0,34#0,35#0,36#0,37#0,38#0,39#0,40#0,41#0,42#0,43#0,44#0,45#0,46#0,47#0,48#0,49#0,50#0,51#0,52#0,53#0,54#0,55#0,56#0,57#0,58#0,59#0,60#0,61#0,62#0,63#0,64#0,65#0,66#0,67#0,68#0,69#0,70#0,71#0,72#0,73#0,74#0,75#0,76#0,77#0,78#0,79#0,80#0,81#0,82#0,83#0,84#0,85#0,86#0,87#0,88#0,89#0,90#0,91#0,92#0,93#0,94#0,95#0,96#0,97#0,98#0,99#0,100#0,101#0,102#0,103#0,104#0,105#0,106#0,107#0,108#0,109#0,110#0,111#0,112#0,113#0,114#0,115#0,116#0,117#0,118#0,119#0,120#0,121#0,122#0,123#0,124#0,125#0,126#0,127#0,128#0,129#0,130#0,131#0,132#0,133#0,134#0,135#0,136#0,137#0,138#0,139#0,140#0,141#0,142#0,143#0,144#0,145#0,146#0,147#0,148#0,149#0,150#0,151#0,152#0,153#0,154#0,155#0,156#0,157#0,158#0,159#0,160#0,161#0,162#0,163
#0,164#0,165#0,166#0,167#0,168#0,169#0,170#0,171#0,172#0,173#0,174#0,175#0,176#0,177#0,178#0,179#0,180#0,181#0,182#0,183#0,184#0,185#0,186#0,187#0,188#0,189#0,190#0,191#0,192#0,193#0,194#0,195#0,196#0,197#0,198#0,199#0,200#0,201#0,202#0,203#0,204#0,205#0,206#0,207#0,208#0,209#0,210#0,211#0,212#0,213#0,214#0,215#0,216#0,217#0,218#0,219#0,220#0,221#0,222#0
,223#0,224#0,225#0,226#0,227#0,228#0,229#0,230#0,231#0,232#0,233#0,234#0,235#0,236#0,237#0,238#0,239#0,240#0,241#0,242#0,243#0,244#0,245#0,246#0,247#0,248#0,249#0,250#0,251#0,252
#0,253#0,254#0,255#0",
    "mtime": "2019-05-29 18:59:36.264916",
    "max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#,11#,12#,13#,14#,15#,16#,17#,18#,19#,20#,21#,22#,23#,24#,25#,26#,27#,28#,29#,30#,31#,32#,33#,34#,35#,36#,37#,38#,39#,40#,41#,4
2#,43#,44#,45#,46#,47#,48#,49#,50#,51#,52#,53#,54#,55#,56#,57#,58#,59#,60#,61#,62#,63#,64#,65#,66#,67#,68#,69#,70#,71#,72#,73#,74#,75#,76#,77#,78#,79#,80#,81#,82#,83#,84#,85#,86#
,87#,88#,89#,90#,91#,92#,93#,94#,95#,96#,97#,98#,99#,100#,101#,102#,103#,104#,105#,106#,107#,108#,109#,110#,111#,112#,113#,114#,115#,116#,117#,118#,119#,120#,121#,122#,123#,124#,
125#,126#,127#,128#,129#,130#,131#,132#,133#,134#,135#,136#,137#,138#,139#,140#,141#,142#,143#,144#,145#,146#,147#,148#,149#,150#,151#,152#,153#,154#,155#,156#,157#,158#,159#,160
#,161#,162#,163#,164#,165#,166#,167#,168#,169#,170#,171#,172#,173#,174#,175#,176#,177#,178#,179#,180#,181#,182#,183#,184#,185#,186#,187#,188#,189#,190#,191#,192#,193#,194#,195#,1
96#,197#,198#,199#,200#,201#,202#,203#,204#,205#,206#,207#,208#,209#,210#,211#,212#,213#,214#,215#,216#,217#,218#,219#,220#,221#,222#,223#,224#,225#,226#,227#,228#,229#,230#,231#
,232#,233#,234#,235#,236#,237#,238#,239#,240#,241#,242#,243#,244#,245#,246#,247#,248#,249#,250#,251#,252#,253#,254#,255#",
    "usage": {
        "rgw.none": {
            "size": 0,
            "size_actual": 0,
            "size_utilized": 0,
            "size_kb": 0,
            "size_kb_actual": 0,
            "size_kb_utilized": 0,
            "num_objects": 2970541
        },
        "rgw.main": {
            "size": 29584795218033,
            "size_actual": 29639674630144,
            "size_utilized": 29584795218033,
            "size_kb": 28891401581,
            "size_kb_actual": 28944994756,
            "size_kb_utilized": 28891401581,
            "num_objects": 22629845
        }
    },
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    }
}

Canceling the reshard fails:

radosgw-admin reshard cancel --bucket prelabeling
There is ongoing resharding, please retry after 2019-05-29 23:10:48.247 7ffbdddd2640  0 RGWReshardLock::lock failed to acquire lock on prelabeling:4529fe07-c707-4c2d-9ccb-1ecf8154aa23.32005164.1 ret=-16
360 seconds

(yes, there is mangled output - I suppose the "360 seconds" should come before the timestamp...)

If I try to add a reshard operation manually:

radosgw-admin reshard add --bucket prelabeling --num-shards 257
radosgw-admin reshard process
ERROR: failed to process reshard logs, error=2019-05-29 23:13:34.387 7f6006e45640  0 RGWReshardLock::lock failed to acquire lock on reshard.0000000003 ret=-16(16) Device or resource busy

The resharding is stuck for over 14 hours now.
I tried restarting all RGWs (we have 5), I now stopped all except one - in the hope that it might be a weird race between them - nothing.

Whatever I do, the main problem seems to be that it is stuck with the RGWReshardLock - and nobody is releasing it...

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #39485

Luminous: a huge bucket stuck in dynamic resharding for one week

Updated by Casey Bodley almost 5 years ago

Updated by Rui Xu almost 5 years ago

Updated by J. Eric Ivancich almost 5 years ago

Updated by Rui Xu almost 5 years ago

Updated by Rui Xu almost 5 years ago

Updated by J. Eric Ivancich almost 5 years ago

Updated by Casey Bodley almost 5 years ago

Updated by Thomas Kriechbaumer almost 5 years ago