https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2022-06-09T14:19:38ZCeph rgw - Bug #55979: [rgw-multisite][hybrid workload]:1 object failed to delete on secondary for a bucket 'con2'. https://tracker.ceph.com/issues/55979?journal_id=2176762022-06-09T14:19:38ZVidushi Mishra
<ul></ul><p>Configs on all rgws:
======================</p>
<p>debug_ms = 0 <br />debug_rgw = 5<br />debug_rgw_sync = 20<br />rgw_data_notify_interval_msec = 0</p> rgw - Bug #55979: [rgw-multisite][hybrid workload]:1 object failed to delete on secondary for a bucket 'con2'. https://tracker.ceph.com/issues/55979?journal_id=2176772022-06-09T14:20:13ZVidushi Mishra
<ul></ul><p>ceph version 17.0.0-12762-g63f84c50 (63f84c50e0851d456fc38b3330945c54162dd544) quincy (dev)</p> rgw - Bug #55979: [rgw-multisite][hybrid workload]:1 object failed to delete on secondary for a bucket 'con2'. https://tracker.ceph.com/issues/55979?journal_id=2176842022-06-09T14:47:57ZMatt Benjaminmbenjamin@redhat.com
<ul></ul><p>Hi Vidushi,</p>
<p>We need to focus on getting results repeatable by Tejas and Mark. Is this repeatable?</p>
<p>Matt</p> rgw - Bug #55979: [rgw-multisite][hybrid workload]:1 object failed to delete on secondary for a bucket 'con2'. https://tracker.ceph.com/issues/55979?journal_id=2176972022-06-09T17:39:35ZMark Koganmkogan@redhat.com
<ul></ul><p>in my testing have seen s3 operation that return 500 during resharding, <br />possibly cosbench did not re-try to delete or the re-try also return 500 (as resharding of large bucket under load duration can be long)<br />now that the load is not running anymore as it seems - <br />is it possible to please perform `s3cmd rm ...` on this object manually and let us know if it's deleted?</p> rgw - Bug #55979: [rgw-multisite][hybrid workload]:1 object failed to delete on secondary for a bucket 'con2'. https://tracker.ceph.com/issues/55979?journal_id=2176982022-06-09T17:54:22ZVidushi Mishra
<ul><li><strong>File</strong> <a href="/attachments/download/6049/primary-hybrid.xml">primary-hybrid.xml</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/6049/primary-hybrid.xml">View</a> added</li><li><strong>File</strong> <a href="/attachments/download/6050/secondary-hybrid.xml">secondary-hybrid.xml</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/6050/secondary-hybrid.xml">View</a> added</li></ul><p>Hi Mark,</p>
<p>Sure, would try the 's3cmd rm' on the object and update.</p>
<p>Also, I wanted to add some more info for this issue.</p>
1. The overall delay between the sites is configured as 5 ms.
<ol>
<li>tc qdisc show dev ens2f0 <br />qdisc netem 8001: root refcnt 33 limit 1000 delay 5ms</li>
</ol>
<p>2. The object that did not delete is 'myobjects2230853' for bucket 'con2'.</p>
<p>on secondary:</p>
<ol>
<li>s3cmd ls s3://con2/myobjects2230853 <br />2022-06-08 13:00 2000 s3://con2/myobjects2230853</li>
</ol>
<p>on primary:</p>
<ol>
<li>s3cmd get s3://con2/myobjects2230853 con2-myobjects2230853<br />download: 's3://con2/myobjects2230853' -> 'con2-myobjects2230853' [1 of 1]<br />ERROR: S3 error: 404 (NoSuchKey)</li>
</ol>
<p>3. Also the logs on the secondary show :</p>
<ol>
<li>grep "myobjects2230853" ceph-client.rgw.usa.8081.extensa033.aulxfi.log | grep con2</li>
</ol>
<p>2022-06-09T07:22:26.413+0000 7f308e7c2700 1 beast: 0x7f30024a5650: 10.8.128.100 - user1 [09/Jun/2022:07:22:26.348 +0000] "GET /con2/?delimiter=%2F&prefix=myobjects2230853 HTTP/1.1" 200 573 - - - latency=0.064997561s</p>
<p>4. The workload XML is attached in the files.</p> rgw - Bug #55979: [rgw-multisite][hybrid workload]:1 object failed to delete on secondary for a bucket 'con2'. https://tracker.ceph.com/issues/55979?journal_id=2177012022-06-09T19:13:02ZVidushi Mishra
<ul></ul><p>Vidushi Mishra wrote:</p>
<blockquote>
<p>Hi Mark,</p>
<p>Sure, would try the 's3cmd rm' on the object and update.</p>
<p>Also, I wanted to add some more info for this issue.</p>
1. The overall delay between the sites is configured as 5 ms.
<ol>
<li>tc qdisc show dev ens2f0 <br />qdisc netem 8001: root refcnt 33 limit 1000 delay 5ms</li>
</ol>
<p>2. The object that did not delete is 'myobjects2230853' for bucket 'con2'.</p>
<p>on secondary:</p>
<ol>
<li>s3cmd ls s3://con2/myobjects2230853 <br />2022-06-08 13:00 2000 s3://con2/myobjects2230853</li>
</ol>
<p>on primary:</p>
<ol>
<li>s3cmd get s3://con2/myobjects2230853 con2-myobjects2230853<br />download: 's3://con2/myobjects2230853' -> 'con2-myobjects2230853' [1 of 1]<br />ERROR: S3 error: 404 (NoSuchKey)</li>
</ol>
<p>3. Also the logs on the secondary show :</p>
<ol>
<li>grep "myobjects2230853" ceph-client.rgw.usa.8081.hostname.aulxfi.log | grep con2</li>
</ol>
<p>2022-06-09T07:22:26.413+0000 7f308e7c2700 1 beast: 0x7f30024a5650: host_ip - user1 [09/Jun/2022:07:22:26.348 +0000] "GET /con2/?delimiter=%2F&prefix=myobjects2230853 HTTP/1.1" 200 573 - - - latency=0.064997561s</p>
<p>4. The workload XML is attached in the files.</p>
</blockquote> rgw - Bug #55979: [rgw-multisite][hybrid workload]:1 object failed to delete on secondary for a bucket 'con2'. https://tracker.ceph.com/issues/55979?journal_id=2178532022-06-13T10:37:00ZMark Koganmkogan@redhat.com
<ul></ul><p>There seem to be conflicting information when performing data collection on the secondary zone, the<br />radosgw-admin bucket sync status --bucket con2<br />and the<br />radosgw-admin sync status<br />do not agree on the number of shards and the shard that is behind<br />will consult the team and update</p>
<pre>
[root@argo011 ~]# radosgw-admin bucket sync status --bucket con2
Mon Jun 13 09:49:15 UTC 2022
realm 7ca39d2b-9f6f-46f3-ada9-0a0b88b2c5ba (data)
zonegroup f11960e9-6187-4d13-9618-e1aa882ed75f (us)
zone 0a44885f-4fee-41d7-ae0b-d1c5368d5170 (west)
bucket :con2[e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1])
source zone e8cd3943-5b6e-44a7-8cbb-be909b52ed9e (east)
source bucket con2:e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1
incremental sync on 137 shards
^^^
bucket is behind on 1 shards
behind shards: [85]
^^
[root@argo011 ~]# radosgw-admin sync status
Mon Jun 13 09:48:29 UTC 2022
realm 7ca39d2b-9f6f-46f3-ada9-0a0b88b2c5ba (data)
zonegroup f11960e9-6187-4d13-9618-e1aa882ed75f (us)
zone 0a44885f-4fee-41d7-ae0b-d1c5368d5170 (west)
zonegroup features enabled: resharding
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: e8cd3943-5b6e-44a7-8cbb-be909b52ed9e (east)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
^^^
data is behind on 1 shards
behind shards: [100]
^^^
[root@argo011 ~]# radosgw-admin bucket stats --bucket con2
{
"bucket": "con2",
"num_shards": 103,
^^^^
"tenant": "",
"zonegroup": "f11960e9-6187-4d13-9618-e1aa882ed75f",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1",
"marker": "e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1",
"index_type": "Normal",
"owner": "user1",
...
}
</pre> rgw - Bug #55979: [rgw-multisite][hybrid workload]:1 object failed to delete on secondary for a bucket 'con2'. https://tracker.ceph.com/issues/55979?journal_id=2179332022-06-13T16:53:56ZMark Koganmkogan@redhat.com
<ul></ul><p>adding...<br />found the source of shard `137` although the secondary has only `103` shards, the primary has `137` shards on bucket `con2`</p>
<pre>
[root@magna051 ~]# radosgw-admin bucket stats --bucket con2
{
"bucket": "con2",
"num_shards": 137,
"tenant": "",
"zonegroup": "f11960e9-6187-4d13-9618-e1aa882ed75f",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1",
"marker": "e8cd3943-5b6e-44a7-8cbb-be909b52ed9e.49011.1",
"index_type": "Normal",
"owner": "user1",
</pre> rgw - Bug #55979: [rgw-multisite][hybrid workload]:1 object failed to delete on secondary for a bucket 'con2'. https://tracker.ceph.com/issues/55979?journal_id=2182432022-06-16T14:12:17ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Triaged</i></li><li><strong>Assignee</strong> set to <i>Mark Kogan</i></li></ul>