Bug #55979
open[rgw-multisite][hybrid workload]:1 object failed to delete on secondary for a bucket 'con2'.
0%
Description
Description of the issue:
1 object failed to delete on secondary for a bucket 'con2'.
Data is behind 1 shard in 'radosgw-admin sync status'
[root@argo011 ~]# radosgw-admin sync status
realm 7ca39d2b-9f6f-46f3-ada9-0a0b88b2c5ba (data)
zonegroup f11960e9-6187-4d13-9618-e1aa882ed75f (us)
zone 0a44885f-4fee-41d7-ae0b-d1c5368d5170 (west)
zonegroup features enabled: resharding
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: e8cd3943-5b6e-44a7-8cbb-be909b52ed9e (east)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 1 shards
behind shards: [100]
[root@argo011 ~]# radosgw-admin bucket sync status --bucket con2
realm 7ca39d2b-9f6f-46f3-ada9-0a0b88b2c5ba (data)
zonegroup f11960e9-6187-4d13-9618-e1aa882ed75f (us)
zone 0a44885f-4fee-41d7-ae0b-d1c5368d5170 (west)
bucket :con2[e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1])
source zone e8cd3943-5b6e-44a7-8cbb-be909b52ed9e (east)
source bucket con2:e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1
incremental sync on 137 shards
bucket is behind on 1 shards
behind shards: [85]
[root@argo011 ~]#
Workflow: ===========
1. Create 5 buckets con{1..5} on a multisite.
2. Write 5M objects per bucket and wait for the data sync to catch up on both sites.
3. Perform a hybrid bi-directional workload for 10 hours having a mix of
[write, delete, read, and list] operations.
Workload XML defined at [a]
4. Once the workload completes, wait for the sync to complete on both sides.
5. After 10-12 hours, we observed that 1 object failed to delete on the secondary, and data is behind on 1 shard on the secondary site.
Additional Info:
[a]:
--------------------
workload at primary
--------------------
<workload name="fillCluster" description="RGW testing">
<storage type="s3" config="timeout=900000;accesskey=123;secretkey=123;endpoint=http://localhost:5000;path_style_access=true" retry="3"/>
<workflow>
<workstage name="MAIN">
<work name="hybrid" workers="400" runtime="36000" >
<operation name="writeOP" type="write" ratio="36" config="cprefix=con;containers=u(1,2);objects=u(1,2500000);sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB" />
<operation name="deleteOP" type="delete" ratio="5" config="cprefix=con;containers=u(1,2);objects=u(1,2500000);sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB" />
<operation name="readOP" type="read" ratio="44" config="cprefix=con;containers=u(3,5);objects=u(1,2500000);hashCheck=true" />
<operation name="listOP" type="list" ratio="15" config="cprefix=con;containers=u(3,5);objects=u(1,2500000);hashCheck=true" />
</work>
</workstage>
</workflow>
</workload>
----------------------
workload at secondary
----------------------
<workload name="fillCluster" description="RGW testing">
<storage type="s3" config="timeout=900000;accesskey=123;secretkey=123;endpoint=http://localhost:5000;path_style_access=true" retry="3"/>
<workflow>
<workstage name="MAIN">
<work name="hybrid" workers="400" runtime="36000" >
<operation name="writeOP" type="write" ratio="36" config="cprefix=con;containers=u(1,2);objects=u(2500001,5000000);sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB" />
<operation name="deleteOP" type="delete" ratio="5" config="cprefix=con;containers=u(1,2);objects=u(2500001,5000000);sizes=h(1|2|25,2|4|40,4|8|25,8|256|10)KB" />
<operation name="readOP" type="read" ratio="44" config="cprefix=con;containers=u(3,5);objects=u(2500001,5000000);hashCheck=true" />
<operation name="listOP" type="list" ratio="15" config="cprefix=con;containers=u(3,5);objects=u(2500001,5000000);hashCheck=true" />
</work>
</workstage>
</workflow>
</workload>
Files
Updated by Vidushi Mishra almost 2 years ago
Configs on all rgws: ======================
debug_ms = 0
debug_rgw = 5
debug_rgw_sync = 20
rgw_data_notify_interval_msec = 0
Updated by Vidushi Mishra almost 2 years ago
ceph version 17.0.0-12762-g63f84c50 (63f84c50e0851d456fc38b3330945c54162dd544) quincy (dev)
Updated by Matt Benjamin almost 2 years ago
Hi Vidushi,
We need to focus on getting results repeatable by Tejas and Mark. Is this repeatable?
Matt
Updated by Mark Kogan almost 2 years ago
in my testing have seen s3 operation that return 500 during resharding,
possibly cosbench did not re-try to delete or the re-try also return 500 (as resharding of large bucket under load duration can be long)
now that the load is not running anymore as it seems -
is it possible to please perform `s3cmd rm ...` on this object manually and let us know if it's deleted?
Updated by Vidushi Mishra almost 2 years ago
- File primary-hybrid.xml primary-hybrid.xml added
- File secondary-hybrid.xml secondary-hybrid.xml added
Hi Mark,
Sure, would try the 's3cmd rm' on the object and update.
Also, I wanted to add some more info for this issue.
1. The overall delay between the sites is configured as 5 ms.- tc qdisc show dev ens2f0
qdisc netem 8001: root refcnt 33 limit 1000 delay 5ms
2. The object that did not delete is 'myobjects2230853' for bucket 'con2'.
on secondary:
- s3cmd ls s3://con2/myobjects2230853
2022-06-08 13:00 2000 s3://con2/myobjects2230853
on primary:
- s3cmd get s3://con2/myobjects2230853 con2-myobjects2230853
download: 's3://con2/myobjects2230853' -> 'con2-myobjects2230853' [1 of 1]
ERROR: S3 error: 404 (NoSuchKey)
3. Also the logs on the secondary show :
- grep "myobjects2230853" ceph-client.rgw.usa.8081.extensa033.aulxfi.log | grep con2
2022-06-09T07:22:26.413+0000 7f308e7c2700 1 beast: 0x7f30024a5650: 10.8.128.100 - user1 [09/Jun/2022:07:22:26.348 +0000] "GET /con2/?delimiter=%2F&prefix=myobjects2230853 HTTP/1.1" 200 573 - - - latency=0.064997561s
4. The workload XML is attached in the files.
Updated by Vidushi Mishra almost 2 years ago
Vidushi Mishra wrote:
Hi Mark,
Sure, would try the 's3cmd rm' on the object and update.
Also, I wanted to add some more info for this issue.
1. The overall delay between the sites is configured as 5 ms.
- tc qdisc show dev ens2f0
qdisc netem 8001: root refcnt 33 limit 1000 delay 5ms2. The object that did not delete is 'myobjects2230853' for bucket 'con2'.
on secondary:
- s3cmd ls s3://con2/myobjects2230853
2022-06-08 13:00 2000 s3://con2/myobjects2230853on primary:
- s3cmd get s3://con2/myobjects2230853 con2-myobjects2230853
download: 's3://con2/myobjects2230853' -> 'con2-myobjects2230853' [1 of 1]
ERROR: S3 error: 404 (NoSuchKey)3. Also the logs on the secondary show :
- grep "myobjects2230853" ceph-client.rgw.usa.8081.hostname.aulxfi.log | grep con2
2022-06-09T07:22:26.413+0000 7f308e7c2700 1 beast: 0x7f30024a5650: host_ip - user1 [09/Jun/2022:07:22:26.348 +0000] "GET /con2/?delimiter=%2F&prefix=myobjects2230853 HTTP/1.1" 200 573 - - - latency=0.064997561s
4. The workload XML is attached in the files.
Updated by Mark Kogan almost 2 years ago
There seem to be conflicting information when performing data collection on the secondary zone, the
radosgw-admin bucket sync status --bucket con2
and the
radosgw-admin sync status
do not agree on the number of shards and the shard that is behind
will consult the team and update
[root@argo011 ~]# radosgw-admin bucket sync status --bucket con2 Mon Jun 13 09:49:15 UTC 2022 realm 7ca39d2b-9f6f-46f3-ada9-0a0b88b2c5ba (data) zonegroup f11960e9-6187-4d13-9618-e1aa882ed75f (us) zone 0a44885f-4fee-41d7-ae0b-d1c5368d5170 (west) bucket :con2[e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1]) source zone e8cd3943-5b6e-44a7-8cbb-be909b52ed9e (east) source bucket con2:e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1 incremental sync on 137 shards ^^^ bucket is behind on 1 shards behind shards: [85] ^^ [root@argo011 ~]# radosgw-admin sync status Mon Jun 13 09:48:29 UTC 2022 realm 7ca39d2b-9f6f-46f3-ada9-0a0b88b2c5ba (data) zonegroup f11960e9-6187-4d13-9618-e1aa882ed75f (us) zone 0a44885f-4fee-41d7-ae0b-d1c5368d5170 (west) zonegroup features enabled: resharding metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: e8cd3943-5b6e-44a7-8cbb-be909b52ed9e (east) syncing full sync: 0/128 shards incremental sync: 128/128 shards ^^^ data is behind on 1 shards behind shards: [100] ^^^ [root@argo011 ~]# radosgw-admin bucket stats --bucket con2 { "bucket": "con2", "num_shards": 103, ^^^^ "tenant": "", "zonegroup": "f11960e9-6187-4d13-9618-e1aa882ed75f", "placement_rule": "default-placement", "explicit_placement": { "data_pool": "", "data_extra_pool": "", "index_pool": "" }, "id": "e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1", "marker": "e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1", "index_type": "Normal", "owner": "user1", ... }
Updated by Mark Kogan almost 2 years ago
adding...
found the source of shard `137` although the secondary has only `103` shards, the primary has `137` shards on bucket `con2`
[root@magna051 ~]# radosgw-admin bucket stats --bucket con2 { "bucket": "con2", "num_shards": 137, "tenant": "", "zonegroup": "f11960e9-6187-4d13-9618-e1aa882ed75f", "placement_rule": "default-placement", "explicit_placement": { "data_pool": "", "data_extra_pool": "", "index_pool": "" }, "id": "e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1", "marker": "e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1", "index_type": "Normal", "owner": "user1",
Updated by Casey Bodley almost 2 years ago
- Status changed from New to Triaged
- Assignee set to Mark Kogan