Project

General

Profile

Bug #55979

[rgw-multisite][hybrid workload]:1 object failed to delete on secondary for a bucket 'con2'.

Added by Vidushi Mishra 8 months ago. Updated 8 months ago.

Status:
Triaged
Priority:
Normal
Assignee:
Target version:
% Done:

0%

Source:
Tags:
multisite-reshard
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Description of the issue:

1 object failed to delete on secondary for a bucket 'con2'.

Data is behind 1 shard in 'radosgw-admin sync status'

[root@argo011 ~]# radosgw-admin sync status
realm 7ca39d2b-9f6f-46f3-ada9-0a0b88b2c5ba (data)
zonegroup f11960e9-6187-4d13-9618-e1aa882ed75f (us)
zone 0a44885f-4fee-41d7-ae0b-d1c5368d5170 (west)
zonegroup features enabled: resharding
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: e8cd3943-5b6e-44a7-8cbb-be909b52ed9e (east)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 1 shards
behind shards: [100]
[root@argo011 ~]# radosgw-admin bucket sync status --bucket con2
realm 7ca39d2b-9f6f-46f3-ada9-0a0b88b2c5ba (data)
zonegroup f11960e9-6187-4d13-9618-e1aa882ed75f (us)
zone 0a44885f-4fee-41d7-ae0b-d1c5368d5170 (west)
bucket :con2[e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1])

source zone e8cd3943-5b6e-44a7-8cbb-be909b52ed9e (east)
source bucket con2:e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1
incremental sync on 137 shards
bucket is behind on 1 shards
behind shards: [85]
[root@argo011 ~]#

Workflow: ===========

1. Create 5 buckets con{1..5} on a multisite.
2. Write 5M objects per bucket and wait for the data sync to catch up on both sites.
3. Perform a hybrid bi-directional workload for 10 hours having a mix of
[write, delete, read, and list] operations.
Workload XML defined at [a]
4. Once the workload completes, wait for the sync to complete on both sides.
5. After 10-12 hours, we observed that 1 object failed to delete on the secondary, and data is behind on 1 shard on the secondary site.

Additional Info:

[a]:

--------------------
workload at primary
--------------------

<workload name="fillCluster" description="RGW testing">
<!-- Initialization -->
<storage type="s3" config="timeout=900000;accesskey=123;secretkey=123;endpoint=http://localhost:5000;path_style_access=true" retry="3"/>
<workflow>

<!-- Use operation mix & object sizes as defined in vars.shinc -->

&lt;workstage name="MAIN"&gt;
&lt;work name="hybrid" workers="400" runtime="36000" &gt;
&lt;operation name="writeOP" type="write" ratio="36" config="cprefix=con;containers=u(1,2);objects=u(1,2500000);sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB" /&gt;
&lt;operation name="deleteOP" type="delete" ratio="5" config="cprefix=con;containers=u(1,2);objects=u(1,2500000);sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB" /&gt;
&lt;operation name="readOP" type="read" ratio="44" config="cprefix=con;containers=u(3,5);objects=u(1,2500000);hashCheck=true" /&gt;
&lt;operation name="listOP" type="list" ratio="15" config="cprefix=con;containers=u(3,5);objects=u(1,2500000);hashCheck=true" /&gt;
&lt;/work&gt;
&lt;/workstage&gt;
&lt;/workflow&gt;

</workload>

----------------------
workload at secondary
----------------------

<workload name="fillCluster" description="RGW testing">
<!-- Initialization -->
<storage type="s3" config="timeout=900000;accesskey=123;secretkey=123;endpoint=http://localhost:5000;path_style_access=true" retry="3"/>
<workflow>

<!-- Use operation mix & object sizes as defined in vars.shinc -->

&lt;workstage name="MAIN"&gt;
&lt;work name="hybrid" workers="400" runtime="36000" &gt;
&lt;operation name="writeOP" type="write" ratio="36" config="cprefix=con;containers=u(1,2);objects=u(2500001,5000000);sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB" /&gt;
&lt;operation name="deleteOP" type="delete" ratio="5" config="cprefix=con;containers=u(1,2);objects=u(2500001,5000000);sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB" /&gt;
&lt;operation name="readOP" type="read" ratio="44" config="cprefix=con;containers=u(3,5);objects=u(2500001,5000000);hashCheck=true" /&gt;
&lt;operation name="listOP" type="list" ratio="15" config="cprefix=con;containers=u(3,5);objects=u(2500001,5000000);hashCheck=true" /&gt;
&lt;/work&gt;
&lt;/workstage&gt;
&lt;/workflow&gt;

</workload>

primary-hybrid.xml View - primary-hybrid-workload.xml (1.05 KB) Vidushi Mishra, 06/09/2022 05:51 PM

secondary-hybrid.xml View - secondary-hybrid-workload.xml (1.07 KB) Vidushi Mishra, 06/09/2022 05:51 PM

History

#1 Updated by Vidushi Mishra 8 months ago

Configs on all rgws: ======================

debug_ms = 0
debug_rgw = 5
debug_rgw_sync = 20
rgw_data_notify_interval_msec = 0

#2 Updated by Vidushi Mishra 8 months ago

ceph version 17.0.0-12762-g63f84c50 (63f84c50e0851d456fc38b3330945c54162dd544) quincy (dev)

#3 Updated by Matt Benjamin 8 months ago

Hi Vidushi,

We need to focus on getting results repeatable by Tejas and Mark. Is this repeatable?

Matt

#4 Updated by Mark Kogan 8 months ago

in my testing have seen s3 operation that return 500 during resharding,
possibly cosbench did not re-try to delete or the re-try also return 500 (as resharding of large bucket under load duration can be long)
now that the load is not running anymore as it seems -
is it possible to please perform `s3cmd rm ...` on this object manually and let us know if it's deleted?

#5 Updated by Vidushi Mishra 8 months ago

Hi Mark,

Sure, would try the 's3cmd rm' on the object and update.

Also, I wanted to add some more info for this issue.

1. The overall delay between the sites is configured as 5 ms.
  1. tc qdisc show dev ens2f0
    qdisc netem 8001: root refcnt 33 limit 1000 delay 5ms

2. The object that did not delete is 'myobjects2230853' for bucket 'con2'.

on secondary:

  1. s3cmd ls s3://con2/myobjects2230853
    2022-06-08 13:00 2000 s3://con2/myobjects2230853

on primary:

  1. s3cmd get s3://con2/myobjects2230853 con2-myobjects2230853
    download: 's3://con2/myobjects2230853' -> 'con2-myobjects2230853' [1 of 1]
    ERROR: S3 error: 404 (NoSuchKey)

3. Also the logs on the secondary show :

  1. grep "myobjects2230853" ceph-client.rgw.usa.8081.extensa033.aulxfi.log | grep con2

2022-06-09T07:22:26.413+0000 7f308e7c2700 1 beast: 0x7f30024a5650: 10.8.128.100 - user1 [09/Jun/2022:07:22:26.348 +0000] "GET /con2/?delimiter=%2F&prefix=myobjects2230853 HTTP/1.1" 200 573 - - - latency=0.064997561s

4. The workload XML is attached in the files.

#6 Updated by Vidushi Mishra 8 months ago

Vidushi Mishra wrote:

Hi Mark,

Sure, would try the 's3cmd rm' on the object and update.

Also, I wanted to add some more info for this issue.

1. The overall delay between the sites is configured as 5 ms.
  1. tc qdisc show dev ens2f0
    qdisc netem 8001: root refcnt 33 limit 1000 delay 5ms

2. The object that did not delete is 'myobjects2230853' for bucket 'con2'.

on secondary:

  1. s3cmd ls s3://con2/myobjects2230853
    2022-06-08 13:00 2000 s3://con2/myobjects2230853

on primary:

  1. s3cmd get s3://con2/myobjects2230853 con2-myobjects2230853
    download: 's3://con2/myobjects2230853' -> 'con2-myobjects2230853' [1 of 1]
    ERROR: S3 error: 404 (NoSuchKey)

3. Also the logs on the secondary show :

  1. grep "myobjects2230853" ceph-client.rgw.usa.8081.hostname.aulxfi.log | grep con2

2022-06-09T07:22:26.413+0000 7f308e7c2700 1 beast: 0x7f30024a5650: host_ip - user1 [09/Jun/2022:07:22:26.348 +0000] "GET /con2/?delimiter=%2F&prefix=myobjects2230853 HTTP/1.1" 200 573 - - - latency=0.064997561s

4. The workload XML is attached in the files.

#7 Updated by Mark Kogan 8 months ago

There seem to be conflicting information when performing data collection on the secondary zone, the
radosgw-admin bucket sync status --bucket con2
and the
radosgw-admin sync status
do not agree on the number of shards and the shard that is behind
will consult the team and update

[root@argo011 ~]# radosgw-admin bucket sync status --bucket con2
Mon Jun 13 09:49:15 UTC 2022
          realm 7ca39d2b-9f6f-46f3-ada9-0a0b88b2c5ba (data)
      zonegroup f11960e9-6187-4d13-9618-e1aa882ed75f (us)
           zone 0a44885f-4fee-41d7-ae0b-d1c5368d5170 (west)
         bucket :con2[e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1])

    source zone e8cd3943-5b6e-44a7-8cbb-be909b52ed9e (east)
  source bucket con2:e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1
                incremental sync on 137 shards
                                    ^^^
                bucket is behind on 1 shards
                behind shards: [85]
                                ^^

[root@argo011 ~]# radosgw-admin sync status
Mon Jun 13 09:48:29 UTC 2022
          realm 7ca39d2b-9f6f-46f3-ada9-0a0b88b2c5ba (data)
      zonegroup f11960e9-6187-4d13-9618-e1aa882ed75f (us)
           zone 0a44885f-4fee-41d7-ae0b-d1c5368d5170 (west)
zonegroup features enabled: resharding
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: e8cd3943-5b6e-44a7-8cbb-be909b52ed9e (east)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                                              ^^^
                        data is behind on 1 shards
                        behind shards: [100]
                                        ^^^

[root@argo011 ~]# radosgw-admin bucket stats --bucket con2
{
    "bucket": "con2",
    "num_shards": 103,
                  ^^^^
    "tenant": "",
    "zonegroup": "f11960e9-6187-4d13-9618-e1aa882ed75f",
    "placement_rule": "default-placement",
    "explicit_placement": {
        "data_pool": "",
        "data_extra_pool": "",
        "index_pool": "" 
    },
    "id": "e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1",
    "marker": "e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1",
    "index_type": "Normal",
    "owner": "user1",
    ...
}

#8 Updated by Mark Kogan 8 months ago

adding...
found the source of shard `137` although the secondary has only `103` shards, the primary has `137` shards on bucket `con2`

[root@magna051 ~]# radosgw-admin bucket stats --bucket con2
{
    "bucket": "con2",
    "num_shards": 137,
    "tenant": "",
    "zonegroup": "f11960e9-6187-4d13-9618-e1aa882ed75f",
    "placement_rule": "default-placement",
    "explicit_placement": {
        "data_pool": "",
        "data_extra_pool": "",
        "index_pool": "" 
    },
    "id": "e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1",
    "marker": "e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1",
    "index_type": "Normal",
    "owner": "user1",

#9 Updated by Casey Bodley 8 months ago

  • Status changed from New to Triaged
  • Assignee set to Mark Kogan

Also available in: Atom PDF