https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2020-01-22T17:07:56ZCeph rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknownhttps://tracker.ceph.com/issues/43756?journal_id=1565882020-01-22T17:07:56ZManuel Rios
<ul></ul><p>Hi</p>
<p>With help we launched a standalone rgw instance with a non public port and launched just 3 commands with AWS CLI</p>
<pre><code class="text syntaxhl"><span class="CodeRay">aws --endpoint=http://XXXXXX:7481 --profile=ceph s3api list-multipart-uploads --bucket Evol6
aws --endpoint=http://XXXXXX:7481 --profile=ceph s3api list-parts --bucket Evol6 --key 'MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard disk 1$/20191203010516/431.cbrevision' --upload-id 2~r0BMPPs8CewVZ6Qheu1s9WzaBn7bBvU
aws --endpoint=http://XXXXXX:7481 --profile=ceph s3api abort-multipart-upload --bucket Evol6 --key 'MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard disk 1$/20191203010516/431.cbrevision' --upload-id 2~r0BMPPs8CewVZ6Qheu1s9WzaBn7bBvU
</span></code></pre>
<p>RGW log output with <a class="external" href="https://easydatahost.com/debugs/debug-rgw.zip">https://easydatahost.com/debugs/debug-rgw.zip</a></p>
<p>RGW DAEMON :<br /><pre><code class="text syntaxhl"><span class="CodeRay">/usr/bin/radosgw -d --cluster ceph --name client.rgw.ceph-rgw03 --setuser ceph --setgroup ceph --debug-rgw=20 --debug_ms=1 --rgw_frontends="beast port=7481" --rgw_enable_gc_threads=false --rgw_enable_lc_threads=false
</span></code></pre></p> rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknownhttps://tracker.ceph.com/issues/43756?journal_id=1565902020-01-22T17:22:00ZManuel Rios
<ul></ul><p>Output of cli : radosgw-admin bi list --bucket Evol6 | jq '.[]|select(.idx | match("20191203010516/431.cbrevision"))'</p>
<pre><code class="text syntaxhl"><span class="CodeRay">{
"type": "plain",
"idx": "_multipart_MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard disk 1$/20191203010516/431.cbrevision.2~T7G76R09Pn-267VMbY8cjvZl_BHqfTx.meta",
"entry": {
"name": "_multipart_MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard disk 1$/20191203010516/431.cbrevision.2~T7G76R09Pn-267VMbY8cjvZl_BHqfTx.meta",
"instance": "",
"ver": {
"pool": 40,
"epoch": 4848481
},
"locator": "",
"exists": "true",
"meta": {
"category": 3,
"size": 27,
"mtime": "2019-12-03 02:00:50.589889Z",
"etag": "",
"storage_class": "",
"owner": "catbackup",
"owner_display_name": "Catbackup",
"content_type": "application/octet-stream",
"accounted_size": 0,
"user_data": "",
"appendable": "false"
},
"tag": "_OQRXmFYGxL4JorOtTIVTgaWPP4Hciiu",
"flags": 0,
"pending_map": [],
"versioned_epoch": 0
}
}
{
"type": "plain",
"idx": "_multipart_MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard disk 1$/20191203010516/431.cbrevision.2~r0BMPPs8CewVZ6Qheu1s9WzaBn7bBvU.meta",
"entry": {
"name": "_multipart_MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard disk 1$/20191203010516/431.cbrevision.2~r0BMPPs8CewVZ6Qheu1s9WzaBn7bBvU.meta",
"instance": "",
"ver": {
"pool": 40,
"epoch": 4862265
},
"locator": "",
"exists": "true",
"meta": {
"category": 3,
"size": 27,
"mtime": "2019-12-03 01:23:06.007727Z",
"etag": "",
"storage_class": "",
"owner": "catbackup",
"owner_display_name": "Catbackup",
"content_type": "application/octet-stream",
"accounted_size": 0,
"user_data": "",
"appendable": "false"
},
"tag": "_ShAUoEzV6fSf9M5DGRAfIUnlN-bCwR4",
"flags": 0,
"pending_map": [],
"versioned_epoch": 0
}
}
{
"type": "plain",
"idx": "_multipart_MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard disk 1$/20191203010516/431.cbrevision.2~9djvntf2OBzWT8VLMBixPjZMx6rSwI_.meta",
"entry": {
"name": "_multipart_MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard disk 1$/20191203010516/431.cbrevision.2~9djvntf2OBzWT8VLMBixPjZMx6rSwI_.meta",
"instance": "",
"ver": {
"pool": 40,
"epoch": 4848897
},
"locator": "",
"exists": "true",
"meta": {
"category": 3,
"size": 27,
"mtime": "2019-12-03 03:00:19.076330Z",
"etag": "",
"storage_class": "",
"owner": "catbackup",
"owner_display_name": "Catbackup",
"content_type": "application/octet-stream",
"accounted_size": 0,
"user_data": "",
"appendable": "false"
},
"tag": "_dj5cX7yiIK3HxrLtWYol1ihSdkERdtL",
"flags": 0,
"pending_map": [],
"versioned_epoch": 0
}
}
</span></code></pre> rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknownhttps://tracker.ceph.com/issues/43756?journal_id=1565912020-01-22T18:01:11ZRobin Johnsonrobbat2@gentoo.org
<ul></ul><p>cbodley:<br />I sat down and debugging this with mrf.</p>
<p>There's a few things here, generally related:<br />1. MPU Heads<br />1.1. MPU heads that are still in the index, but the .meta RADOS object is gone.<br />2. MPU Parts<br />2.1. MPU parts that are still in the index but NOT RADOS, but the MPU head is missing in the index<br />2.2. MPU parts that are still in the index AND RADOS, but the MPU head is missing in the index</p>
<p>I think there was a issue for a generalized MPU cleanup tooling, but I don't know the ticket number. This shows the immediate need for it. The leaked parts are eating ~2TB of storage in just this one bucket. DigitalOcean has seem the same issue as far back as Luminous.</p> rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknownhttps://tracker.ceph.com/issues/43756?journal_id=1565922020-01-22T18:06:26ZRobin Johnsonrobbat2@gentoo.org
<ul></ul><p>Snippet of logs showing the MPU head without the RADOS object:</p>
<pre>
2020-01-22 17:45:06.358 7f197fc31700 2 req 2 0.002s s3:list_multipart recalculating target
2020-01-22 17:45:06.358 7f197fc31700 2 req 2 0.002s s3:list_multipart reading permissions
2020-01-22 17:45:06.358 7f197fc31700 20 get_obj_state: rctx=0x564250a2c0d0 obj=Evol6:_multipart_MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard disk 1$/20191203010516/431.cbrevision.2~r0BMPPs8CewVZ6Qheu1s9WzaBn7bBvU.meta state=0x5642501820a0 s->prefetch_data=0
2020-01-22 17:45:06.358 7f197fc31700 1 -- 172.16.2.8:0/218001572 --> [v2:172.16.2.12:6852/524389,v1:172.16.2.12:6853/524389] -- osd_op(unknown.0.0:517 40.1 40:9d5d3eed:::48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.52__multipart_MBS-da43656f-2b8c-464f-b341-03fdbdf446ae%2fCBB_SRV2K12%2fCBB_VM%2f192.168.0.197%2fSRV2K12%2fHard disk 1$%2f20191203010516%2f431.cbrevision.2~r0BMPPs8CewVZ6Qheu1s9WzaBn7bBvU.meta:head [getxattrs,stat] snapc 0=[] ondisk+read+known_if_redirected e1097995) v8 -- 0x56424ffeedc0 con 0x56424fcf8800
2020-01-22 17:45:06.359 7f19a5c7d700 1 -- 172.16.2.8:0/218001572 <== osd.73 v2:172.16.2.12:6852/524389 10 ==== osd_op_reply(517 48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.52__multipart_MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard disk 1$/20191203010516/431.cbrevision.2~r0BMPPs8CewVZ6Qheu1s9WzaBn7bBvU.meta [getxattrs,stat] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 408+0+0 (crc 0 0 0) 0x56425037a280 con 0x56424fcf8800
2020-01-22 17:45:06.359 7f197fc31700 15 decode_policy Read AccessControlPolicy<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>[SENSITIVE DATA]</ID><DisplayName>Catbackup</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="CanonicalUser"><ID>[SENSITIVE DATA]</ID><DisplayName>Catbackup</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
2020-01-22 17:45:06.359 7f197fc31700 10 req 2 0.003s s3:list_multipart read_permissions on Evol6[48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.52]:MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard disk 1$/20191203010516/431.cbrevision only_bucket=0 ret=-2
2020-01-22 17:45:06.359 7f197fc31700 20 op->ERRORHANDLER: err_no=-2 new_err_no=-2
2020-01-22 17:45:06.359 7f197fc31700 2 req 2 0.003s s3:list_multipart op status=0
2020-01-22 17:45:06.359 7f197fc31700 2 req 2 0.003s s3:list_multipart http status=404
</pre> rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknownhttps://tracker.ceph.com/issues/43756?journal_id=1565992020-01-22T21:36:45ZJ. Eric Ivancichivancich@redhat.com
<ul></ul><p>I wonder if this is affected by the bug in this tracker/pr:</p>
<p><a class="external" href="https://tracker.ceph.com/issues/43583">https://tracker.ceph.com/issues/43583</a><br /><a class="external" href="https://github.com/ceph/ceph/pull/32617">https://github.com/ceph/ceph/pull/32617</a></p>
<p>Resharding wasn't putting the MPU parts on the right shards. So the question is if there has been a reshard since the multipart uploads were initiated?</p>
<p>Even once that PR has merged, the MPU parts would still be on the wrong shards, but a reshard would get them on the right shards.</p>
<p>cbodley, in an off-line discussion, suggested a possible work-around that could be done before the PR merges:</p>
<p>1. reshard down to ONE single shard (i.e., then everything is inherently on the right shard)<br />2. clean up the incomplete multipart uploads<br />3. reshard to the desired number of shards</p>
<p>I don't know that that process has been tested. If one were to test it, it might be worth trying that on a single, small-sized bucket.</p> rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknownhttps://tracker.ceph.com/issues/43756?journal_id=1566142020-01-22T23:21:42ZManuel Rios
<ul></ul><p>Hi Eric / Team,</p>
<p>Im going to test your teory about rehard to single shard -> cleanup -> reshard to XX shards.</p>
<p>Im going to do with a bucket of a terminated project that got the same error.</p>
<p>Will report results in a couple hours.</p>
<p>Regards</p> rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknownhttps://tracker.ceph.com/issues/43756?journal_id=1566152020-01-22T23:42:52ZManuel Rios
<ul></ul><p>Well i got the result and nothing sucessfully :</p>
<p>Bucket = DMS<br />Multiparts incompleted date 20190912<br /><pre>
<pre><code class="text">
{
"Initiator": {
"DisplayName": "xxxxx",
"ID": "xxxxx"
},
"Initiated": "2019-09-12T01:38:03.921Z",
"UploadId": "2~Ge19DNi2OVDTu0fqZ7fgJJlh2CrIttJ",
"StorageClass": "STANDARD",
"Key": "MBS-8a3218ee-24a4-42aa-8535-fda31eb46a0d/CBB_MENENDEZ-TS/C$/copias_sql/Kmaleon 20190911 2230.sql$/20190911203244/Kmaleon 20190911 2230.sql",
"Owner": {
"DisplayName": "xxxxx",
"ID": "xxxxx"
}
},
{
"Initiator": {
"DisplayName": "xxxxx",
"ID": "xxxx"
},
"Initiated": "2019-09-11T22:22:55.136Z",
"UploadId": "2~zZOUOY1ewrhTH9CPfURkImjusiFFzkT",
"StorageClass": "STANDARD",
"Key": "MBS-8a3218ee-24a4-42aa-8535-fda31eb46a0d/CBB_MENENDEZ-TS/C$/copias_sql/Kmaleon 20190911 2230.sql$/20190911203244/Kmaleon 20190911 2230.sql",
"Owner": {
"DisplayName": "xxxxxx",
"ID": "xxxxxx"
}
}
]
}
</pre>
radosgw-admin reshard add --bucket DMS --num-shards 1 --yes-i-really-mean-it
[
{
"time": "2020-01-22 23:22:32.807698Z",
"tenant": "",
"bucket_name": "DMS",
"bucket_id": "48efb8c3-693c-4fe0-bbe4-fdc16f590a82.130777415.4",
"new_instance_id": "",
"old_num_shards": 32,
"new_num_shards": 1
}
]
</code></pre></p>
<p>radosgw-admin reshard process</p>
<pre><code class="text syntaxhl"><span class="CodeRay">
2020-01-23 00:28:07.871 7f8edae1d6c0 1 execute INFO: reshard of bucket "DMS" from "DMS:48efb8c3-693c-4fe0-bbe4-fdc16f590a82.130777415.4" to "DMS:48efb8c3-693c-4fe0-bbe4-fdc16f590a82.134292855.1" completed successfully
</span></code></pre>
<p>Checking new bucket sharding:<br /><pre><code class="text syntaxhl"><span class="CodeRay">[root@ceph-rgw03 ~]# radosgw-admin bucket stats --bucket DMS
{
"bucket": "DMS",
"tenant": "",
"zonegroup": "4d8c7c5f-ca40-4ee3-b5bb-b2cad90bd007",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "48efb8c3-693c-4fe0-bbe4-fdc16f590a82.134292855.1",
"marker": "48efb8c3-693c-4fe0-bbe4-fdc16f590a82.110976409.2",
"index_type": "Normal",
"owner": "xxxxxxx",
"ver": "0#28295",
"master_ver": "0#0",
"mtime": "2020-01-22 23:23:26.670489Z",
"max_marker": "0#",
"usage": {
"rgw.main": {
"size": 1566138931393,
"size_actual": 1569989439488,
"size_utilized": 1566138931393,
"size_kb": 1529432551,
"size_kb_actual": 1533192812,
"size_kb_utilized": 1529432551,
"num_objects": 1810738
},
"rgw.multimeta": {
"size": 0,
"size_actual": 0,
"size_utilized": 459,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 1,
"num_objects": 17
}
},
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
}
}
</span></code></pre></p>
<p>Count the number of multiparts</p>
<pre><code class="text syntaxhl"><span class="CodeRay">aws --endpoint=http://xxxxxxxxxx:7480 --profile=ceph s3api list-multipart-uploads --bucket $BUCKETNAME \
| jq -r '.Uploads[] | "--key \"\(.Key)\" --upload-id \(.UploadId)"' | wc
</span></code></pre>
<p>Reports 17 multiparts.</p>
<p>Now trying again the delete:</p>
<pre><code class="text syntaxhl"><span class="CodeRay">BUCKETNAME=DMS
aws --endpoint=http://xxxxxxxx:7480 --profile=ceph s3api list-multipart-uploads --bucket $BUCKETNAME \
| jq -r '.Uploads[] | "--key \"\(.Key)\" --upload-id \(.UploadId)"' \
| while read -r line; do
eval "aws --endpoint=http://xxxxxxx:7480 --profile=ceph s3api abort-multipart-upload --bucket $BUCKETNAME $line";
done
x17 An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknown
</span></code></pre> rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknownhttps://tracker.ceph.com/issues/43756?journal_id=1566162020-01-22T23:45:55ZManuel Rios
<ul></ul><p>Just a note:</p>
<p>Once finish the reshard , bucket stats dont show anymore :</p>
<pre><code class="text syntaxhl"><span class="CodeRay">"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
</span></code></pre>
<p>Compared other buckets</p>
<pre><code class="text syntaxhl"><span class="CodeRay"> "explicit_placement": {
"data_pool": "default.rgw.buckets.data",
"data_extra_pool": "default.rgw.buckets.non-ec",
"index_pool": "default.rgw.buckets.index"
},
</span></code></pre> rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknownhttps://tracker.ceph.com/issues/43756?journal_id=1571782020-01-27T17:30:47ZManuel Rios
<ul></ul><p>Any update or workarround from developers area?</p> rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknownhttps://tracker.ceph.com/issues/43756?journal_id=1576672020-02-02T10:43:08ZOr Friedmann
<ul></ul><p>I saw that you have / in your object name have you tried to use \ as an escape character?</p>
<p>I would be happy to see only the output of the abortmultipart request in the rgw log (debug-ms=0 debug-rgw=20)</p>
<p>Thank you</p> rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknownhttps://tracker.ceph.com/issues/43756?journal_id=1579562020-02-06T15:08:53ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Triaged</i></li></ul> rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknownhttps://tracker.ceph.com/issues/43756?journal_id=1582472020-02-11T07:54:25ZManuel Rios
<ul></ul><p>Hi Mr Friedmann,</p>
<p>Here you can download the debug requested:</p>
<p><a class="external" href="https://file.io/u83gRj">https://file.io/u83gRj</a></p>
<p>Regards</p> rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknownhttps://tracker.ceph.com/issues/43756?journal_id=1586262020-02-13T15:18:17ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-5 priority-high3 closed" href="/issues/43583">Bug #43583</a>: rgw: unable to abort multipart upload after the bucket got resharded</i> added</li></ul> rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknownhttps://tracker.ceph.com/issues/43756?journal_id=1618092020-03-26T19:47:18ZChris Jones
<ul></ul><p>Just an FYI... I know Jewel is EOL, but I am seeing unabortable multiparts in Jewel due to bucket resharding as well.</p>