reproducing below a flow that is similar to a Quay with geo-replication zone configuration using aws cli:
1st multipart-complete:
dd if=/dev/zero of=container.img bs=1M count=4
aws --endpoint-url http://localhost:8000 s3 mb s3://alln-registry
aws --endpoint-url http://localhost:8000 s3api create-multipart-upload --bucket alln-registry --key 'datastorage/registry/sha256/07/07680f8abcbce9cff4b73de2d6ce0aa9d65f35554da0f856fbd4cb4a4492dd4f' | jq
{
"Bucket": "alln-registry",
"Key": "datastorage/registry/sha256/07/07680f8abcbce9cff4b73de2d6ce0aa9d65f35554da0f856fbd4cb4a4492dd4f",
"UploadId": "2~zVVHC7y1pqXH0zbp2_bCcZa61teKt8w"
}
export UPID='2~zVVHC7y1pqXH0zbp2_bCcZa61teKt8w'
aws --endpoint-url http://localhost:8000 s3api upload-part --bucket alln-registry --key 'datastorage/registry/sha256/07/07680f8abcbce9cff4b73de2d6ce0aa9d65f35554da0f856fbd4cb4a4492dd4f' --part-number 1 --body ./container.img --upload-id "${UPID}"
{
"ETag": "\"b5cfa9d6c8febd618f91ac2843d50a1c\""
}
aws --endpoint-url http://localhost:8000 s3api list-parts --bucket alln-registry --key 'datastorage/registry/sha256/07/07680f8abcbce9cff4b73de2d6ce0aa9d65f35554da0f856fbd4cb4a4492dd4f' --upload-id "${UPID}"
{
"Parts": [
{
"PartNumber": 1,
"LastModified": "2021-04-07T11:50:59.516Z",
"ETag": "\"b5cfa9d6c8febd618f91ac2843d50a1c\"",
"Size": 4194304
}
],
"Initiator": null,
"Owner": {
"DisplayName": "",
"ID": ""
},
"StorageClass": "STANDARD"
}
cat <<_EOF > fileparts.json
{
"Parts": [{
"ETag": "b5cfa9d6c8febd618f91ac2843d50a1c",
"PartNumber":1
}]
}
_EOF
aws --endpoint-url http://localhost:8000 s3api complete-multipart-upload --multipart-upload file://fileparts.json --bucket alln-registry --key 'datastorage/registry/sha256/07/07680f8abcbce9cff4b73de2d6ce0aa9d65f35554da0f856fbd4cb4a4492dd4f' --upload-id "${UPID}"
in gdb breakpoint at src/rgw/rgw_op.cc : RGWCompleteMultipart::execute()
serializer = meta_obj->get_serializer("RGWCompleteMultipart");
op_ret = serializer->try_lock(dur, y);
Thread 99 "civetweb-worker" hit Breakpoint 2, RGWCompleteMultipart::execute (this=0x7ffed0016640, y=...) at ../src/rgw/rgw_op.cc:5956
5956 if (op_ret < 0) {
(gdb) p op_ret
$1 = 0
on the 1st multipart-complete request the execution flows as previously and will not enter new flow (does not check for the existence of the first part object before multipart completes)
2nd multipart-complete - when the 1st complete-multipart 200 OK gets lost in transmission (network problem occured), Quay will re-send a complete-multipart request:
aws --endpoint-url http://localhost:8000 s3api complete-multipart-upload --multipart-upload file://fileparts.json --bucket alln-registry --key 'datastorage/registry/sha256/07/07680f8abcbce9cff4b73de2d6ce0aa9d65f35554da0f856fbd4cb4a4492dd4f' --upload-id "${UPID}"
in gdb breakpoint at src/rgw/rgw_op.cc : RGWCompleteMultipart::execute()
serializer = meta_obj->get_serializer("RGWCompleteMultipart");
op_ret = serializer->try_lock(dur, y);
Thread 99 "civetweb-worker" hit Breakpoint 2, RGWCompleteMultipart::execute (this=0x7ffed0016640, y=...) at ../src/rgw/rgw_op.cc:5956
5956 if (op_ret < 0) {
(gdb) p op_ret
$2 = -2
on this scenario the new flow will be executed.
rgw log:
2021-04-07T16:32:12.592+0300 7ffedeffd700 0 req 47 3.161037683s s3:complete_multipart failed to acquire lock
2021-04-07T16:32:12.592+0300 7ffedeffd700 20 req 47 3.161037683s s3:complete_multipart check already complete multipart: 1c816972-ff50-47f6-9d88-88c87c8b9343.4183.1__multipart_datastorage/registry/sha256/07/07680f8abcbce9cff4b73de2d6ce0aa9d65f35554da0f856fbd4cb4a4492dd4f.2~zVVHC7y1pqXH0zbp2_bCcZa61teKt8w.1
2021-04-07T16:32:12.593+0300 7ffedeffd700 2 req 47 3.162037611s s3:complete_multipart completing
2021-04-07T16:32:12.594+0300 7ffedeffd700 20 req 47 3.163037539s get_system_obj_state: rctx=0x7ffedeff8470 obj=default.rgw.log:script.postrequest. state=0x7fffe40e10a0 s->prefetch_data=0
2021-04-07T16:32:12.594+0300 7ffedeffd700 20 cache get: touching lru, lru_counter=59 promotion_ts=55
2021-04-07T16:32:12.594+0300 7ffedeffd700 10 moving default.rgw.log++script.postrequest. to cache LRU end
2021-04-07T16:32:12.594+0300 7ffedeffd700 10 cache get: name=default.rgw.log++script.postrequest. : hit (negative entry)
2021-04-07T16:32:12.594+0300 7ffedeffd700 2 req 47 3.163037539s s3:complete_multipart op status=1901
2021-04-07T16:32:12.594+0300 7ffedeffd700 2 req 47 3.163037539s s3:complete_multipart http status=202
2021-04-07T16:32:12.594+0300 7ffedeffd700 1 ====== req done req=0x7ffedeffa5a0 op status=1901 http_status=202 latency=3.163037539s ======
Potential false-positive scenarios that are handled correctly:
1. wrong key
2. wring bucket
3. wrong UploadId
caught by:
ret = data_ioctx.stat(chk_oid, nullptr, nullptr);
ioctx.stat() ret = " << ret << std::endl;
if (ret < 0) {
ldpp_dout(this, 0) << __func__ << ": ERROR: ioctx.stat(" << chk_oid << ") returned ret = " << ret << dendl;
op_ret = -ERR_NO_SUCH_UPLOAD;
return;
(ret will be = -2)
example:
2021-04-07T16:23:48.823+0300 7ffedeffd700 0 req 44 6.753080368s s3:complete_multipart failed to acquire lock
2021-04-07T16:23:48.823+0300 7ffedeffd700 20 req 44 6.753080368s s3:complete_multipart check already complete multipart: 1c816972-ff50-47f6-9d88-88c87c8b9343.4183.1__multipart_datastorage/re
gistry/sha256/07/07680f8abcbce9cff4b73de2d6ce0aa9d65f35554da0f856fbd4cb4a4492dd4fZ.2~zVVHC7y1pqXH0zbp2_bCcZa61teKt8w.1
2021-04-07T16:23:48.824+0300 7ffedeffd700 0 req 44 6.754080296s s3:complete_multipart execute: ERROR: ioctx.stat(1c816972-ff50-47f6-9d88-88c87c8b9343.4183.1__multipart_datastorage/registry/
sha256/07/07680f8abcbce9cff4b73de2d6ce0aa9d65f35554da0f856fbd4cb4a4492dd4fZ.2~zVVHC7y1pqXH0zbp2_bCcZa61teKt8w.1) returned ret = -2
2021-04-07T16:23:48.824+0300 7ffedeffd700 2 req 44 6.754080296s s3:complete_multipart op status=-2009
2021-04-07T16:23:48.824+0300 7ffedeffd700 2 req 44 6.754080296s s3:complete_multipart http status=404
an 404 is returned above.
Potential false-positive scenarios that are NOT handled correctly yet (related to Etag verification):
1. specifying a wrong ETag returns 200 OK (as noted in the github comments ETags are not verified)
2. a complete-multipart request with any previously successful UploadId will returns 200 OK.
In tests with the Quay client, reproducing the network error scenario preventing the 200 OK transmition,
the false-positive scenarios above related to Etag verificationdo do not occur.