Bug #55979
open[rgw-multisite][hybrid workload]:1 object failed to delete on secondary for a bucket 'con2'.
0%
Description
Description of the issue:
1 object failed to delete on secondary for a bucket 'con2'.
Data is behind 1 shard in 'radosgw-admin sync status'
[root@argo011 ~]# radosgw-admin sync status
realm 7ca39d2b-9f6f-46f3-ada9-0a0b88b2c5ba (data)
zonegroup f11960e9-6187-4d13-9618-e1aa882ed75f (us)
zone 0a44885f-4fee-41d7-ae0b-d1c5368d5170 (west)
zonegroup features enabled: resharding
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: e8cd3943-5b6e-44a7-8cbb-be909b52ed9e (east)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 1 shards
behind shards: [100]
[root@argo011 ~]# radosgw-admin bucket sync status --bucket con2
realm 7ca39d2b-9f6f-46f3-ada9-0a0b88b2c5ba (data)
zonegroup f11960e9-6187-4d13-9618-e1aa882ed75f (us)
zone 0a44885f-4fee-41d7-ae0b-d1c5368d5170 (west)
bucket :con2[e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1])
source zone e8cd3943-5b6e-44a7-8cbb-be909b52ed9e (east)
source bucket con2:e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1
incremental sync on 137 shards
bucket is behind on 1 shards
behind shards: [85]
[root@argo011 ~]#
Workflow: ===========
1. Create 5 buckets con{1..5} on a multisite.
2. Write 5M objects per bucket and wait for the data sync to catch up on both sites.
3. Perform a hybrid bi-directional workload for 10 hours having a mix of
[write, delete, read, and list] operations.
Workload XML defined at [a]
4. Once the workload completes, wait for the sync to complete on both sides.
5. After 10-12 hours, we observed that 1 object failed to delete on the secondary, and data is behind on 1 shard on the secondary site.
Additional Info:
[a]:
--------------------
workload at primary
--------------------
<workload name="fillCluster" description="RGW testing">
<storage type="s3" config="timeout=900000;accesskey=123;secretkey=123;endpoint=http://localhost:5000;path_style_access=true" retry="3"/>
<workflow>
<workstage name="MAIN">
<work name="hybrid" workers="400" runtime="36000" >
<operation name="writeOP" type="write" ratio="36" config="cprefix=con;containers=u(1,2);objects=u(1,2500000);sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB" />
<operation name="deleteOP" type="delete" ratio="5" config="cprefix=con;containers=u(1,2);objects=u(1,2500000);sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB" />
<operation name="readOP" type="read" ratio="44" config="cprefix=con;containers=u(3,5);objects=u(1,2500000);hashCheck=true" />
<operation name="listOP" type="list" ratio="15" config="cprefix=con;containers=u(3,5);objects=u(1,2500000);hashCheck=true" />
</work>
</workstage>
</workflow>
</workload>
----------------------
workload at secondary
----------------------
<workload name="fillCluster" description="RGW testing">
<storage type="s3" config="timeout=900000;accesskey=123;secretkey=123;endpoint=http://localhost:5000;path_style_access=true" retry="3"/>
<workflow>
<workstage name="MAIN">
<work name="hybrid" workers="400" runtime="36000" >
<operation name="writeOP" type="write" ratio="36" config="cprefix=con;containers=u(1,2);objects=u(2500001,5000000);sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB" />
<operation name="deleteOP" type="delete" ratio="5" config="cprefix=con;containers=u(1,2);objects=u(2500001,5000000);sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB" />
<operation name="readOP" type="read" ratio="44" config="cprefix=con;containers=u(3,5);objects=u(2500001,5000000);hashCheck=true" />
<operation name="listOP" type="list" ratio="15" config="cprefix=con;containers=u(3,5);objects=u(2500001,5000000);hashCheck=true" />
</work>
</workstage>
</workflow>
</workload>
Files