Project

General

Profile

Actions

Bug #43756

open

An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknown

Added by Manuel Rios over 4 years ago. Updated about 4 years ago.

Status:
Triaged
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi RGW Team,

The last 7 days we spent trying to solve a metering problem in the buckets.

Well, right now its looks like LifeCycle are not able to purge/delete some objects may be due to some parse problem.

Let post some information:

radosgw-admin user stats --uid=XXXXX
{
    "stats": {
        "total_entries": 22817077,
        "total_bytes": 164278075532090,
        "total_bytes_rounded": 164325122670592
    },
    "last_stats_sync": "2020-01-21 19:32:02.231796Z",
    "last_stats_update": "2020-01-22 14:48:30.915696Z" 
}

Aprox 164TB usage.

The customer got near 57 buckets with different sizes I'm going to post just one.

radosgw-admin bucket stats --bucket=Evol6
{
    "bucket": "Evol6",
    "tenant": "",
    "zonegroup": "4d8c7c5f-ca40-4ee3-b5bb-b2cad90bd007",
    "placement_rule": "default-placement",
    "explicit_placement": {
        "data_pool": "default.rgw.buckets.data",
        "data_extra_pool": "default.rgw.buckets.non-ec",
        "index_pool": "default.rgw.buckets.index" 
    },
    "id": "48efb8c3-693c-4fe0-bbe4-fdc16f590a82.132873679.2",
    "marker": "48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.52",
    "index_type": "Normal",
    "owner": "xxxxxx",
    "ver": "0#91266,1#60635,2#80715,3#78528",
    "master_ver": "0#0,1#0,2#0,3#0",
    "mtime": "2020-01-21 22:38:31.437616Z",
    "max_marker": "0#,1#,2#,3#",
    "usage": {
        "rgw.main": {
            "size": 9107173119747,
            "size_actual": 9107345551360,
            "size_utilized": 9107173119747,
            "size_kb": 8893723750,
            "size_kb_actual": 8893892140,
            "size_kb_utilized": 8893723750,
            "num_objects": 180808
        },
        "rgw.multimeta": {
            "size": 0,
            "size_actual": 0,
            "size_utilized": 3807,
            "size_kb": 0,
            "size_kb_actual": 0,
            "size_kb_utilized": 4,
            "num_objects": 141
        }
    },
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1024,
        "max_size_kb": 0,
        "max_objects": -1
    }
}

Current size 9 TB approx. external Tools like S3 Browser // Cloudberry Amazon S3 Explorer reports 7TB.
Its a considerable difference but its not metadata overhead.

Checking with AWS CLI we found incompleted multipart, it's normal due customer backup thousand of remote computers and use CEPH as backend.

I found a small script to cancel all multipart using AWS CLI.

BUCKETNAME=Evol6
aws  --endpoint=http://XXXXXX:7480 --profile=ceph s3api list-multipart-uploads --bucket $BUCKETNAME \
> | jq -r '.Uploads[] | "--key \"\(.Key)\" --upload-id \(.UploadId)"' \
> | while read -r line; do
>     eval "aws  --endpoint=http://XXXXXXXX:7480 --profile=ceph s3api abort-multipart-upload --bucket $BUCKETNAME $line";
> done

The output generates the same error:

An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknown

An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknown

An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknown

Checking the multipart list :

{
"Initiator": {
"DisplayName": "xxxxx",
"ID": "xxxxx"
},
"Initiated": "2019-12-03T02:00:50.589Z",
"UploadId": "2~T7G76R09Pn-267VMbY8cjvZl_BHqfTx",
"StorageClass": "STANDARD",
"Key": "MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard disk 1$/20191203010516/431.cbrevision",
"Owner": {
"DisplayName": "xxxx",
"ID": "xxxxx"
}
}, {
"Initiator": {
"DisplayName": "xxxxx",
"ID": "xxxx"
},
"Initiated": "2019-12-03T01:23:06.007Z",
"UploadId": "2~r0BMPPs8CewVZ6Qheu1s9WzaBn7bBvU",
"StorageClass": "STANDARD",
"Key": "MBS-da43656f-2b8c-464f-b341-03fdbdf446ae/CBB_SRV2K12/CBB_VM/192.168.0.197/SRV2K12/Hard disk 1$/20191203010516/431.cbrevision",
"Owner": {
"DisplayName": "xxxxx",
"ID": "xxxxx"
}
}

Maybe the parse internally of 1$ char is generating a problem in the LC scripts that don't allow getting purged.

The main problem of that issue is the huge difference between completed files that show in all external tools and the internal storage metering.

Additionally for help this type of customer we add a LC policy that for some reason fails but shows as completed.

s3cmd getlifecycle s3://Evol6 --no-ssl
<?xml version="1.0" ?>
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
        <Rule>
                <ID>Incomplete Multipart Uploads</ID>
                <Prefix/>
                <Status>Enabled</Status>
                <AbortIncompleteMultipartUpload>
                        <DaysAfterInitiation>1</DaysAfterInitiation>
                </AbortIncompleteMultipartUpload>
        </Rule>
</LifecycleConfiguration>
radosgw-admin lc list
[
    {
        "bucket": ":Evol6:48efb8c3-693c-4fe0-bbe4-fdc16f590a82.3886182.52",
        "status": "COMPLETE" 
    }
]

Obviusly completed is not the correct error because in the multipart incomplete show like 157 incompleted uploads.

I appreciated all the help and ideas.

Best Regards


Related issues 1 (0 open1 closed)

Related to rgw - Bug #43583: rgw: unable to abort multipart upload after the bucket got reshardedResolvedJ. Eric Ivancich

Actions
Actions

Also available in: Atom PDF