Bug #43583: rgw: unable to abort multipart upload after the bucket got resharded - rgw - Ceph

Custom queries

Bug queue
Bug triage
Crash queue
Crash triage
Feedback
My issues
Need Review
outstanding non-trivial backports
Pending backports
Product Backlog Scrub
rgw-easy-first-bug
rgw-multisite-backlog

Actions

Copy link

Bug #43583

closed

rgw: unable to abort multipart upload after the bucket got resharded

Added by dongdong tao over 4 years ago. Updated about 4 years ago.

Status:

Resolved

Priority:

High

Assignee:

J. Eric Ivancich

Target version:

% Done:

Source:

Tags:

reshard multipart

Backport:

nautilus,mimic

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

32617

Crash signature (v1):

Crash signature (v2):

Description

There is a bug during the resharding for those multipart entries.
For all the multipart entries, the hash source should be the object name so that all those entries can still be
distributed to one same bucket index shard object.
Right now the code just calculate the shard id based on each entry's name, which is wrong

This can cause the bucket not able to abort the multipart upload and leave the stale multiple entries behind.

Related issues 3 (1 open — 2 closed)

Related to rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknown

Triaged

Actions

Copied to rgw - Backport #43846: nautilus: rgw: unable to abort multipart upload after the bucket got resharded

Resolved

Nathan Cutler

Actions

Copied to rgw - Backport #43847: mimic: rgw: unable to abort multipart upload after the bucket got resharded

Resolved

Nathan Cutler

Actions

Issue # Delay: days Cancel

History
Notes
Property changes

Actions

Copy link

Updated by dongdong tao over 4 years ago

I will open a pull request soon

Actions

Copy link

Updated by dongdong tao over 4 years ago

https://github.com/ceph/ceph/pull/32617

Actions

Copy link

Updated by Casey Bodley over 4 years ago

Priority changed from Normal to High
Tags set to reshard multipart
Backport set to nautilus

Actions

Copy link

Updated by dongdong tao over 4 years ago

@Casey, will this also be backported to luminous ?
May i know is there any plan for 12.2.13 ?

Actions

Copy link

Updated by J. Eric Ivancich over 4 years ago

Status changed from New to Fix Under Review
Pull request ID set to 32617

Actions

Copy link

Updated by J. Eric Ivancich over 4 years ago

Backport changed from nautilus to nautilus,mimic

Actions

Copy link

Updated by J. Eric Ivancich over 4 years ago

Assignee set to J. Eric Ivancich

Actions

Copy link

Updated by Casey Bodley over 4 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

Updated by Nathan Cutler over 4 years ago

Copied to Backport #43846: nautilus: rgw: unable to abort multipart upload after the bucket got resharded added

Actions

Copy link

#10

Updated by Nathan Cutler over 4 years ago

Copied to Backport #43847: mimic: rgw: unable to abort multipart upload after the bucket got resharded added

Actions

Copy link

#11

Updated by Casey Bodley about 4 years ago

Related to Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknown added

Actions

Copy link

#12

Updated by Nathan Cutler about 4 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

#13

Updated by Manuel Rios about 4 years ago

We updated today the cluster to 14.2.8 that apply this backport.
Now LC show more information but also this new errors and continue unable to abort.

2020-03-03 18:13:19.361 7fb58bcfb6c0  5 lifecycle: ERROR: abort_multipart_upload failed, ret=-2009, meta:_multipart_MBS-0fc78b70-efa6-49ef-bdd2-fd3a4b4f2c84/CBB_BIM-AUTOLOG/CBB_DiskImage/Disk_00000000-0000-0000-0000-000000000000/Volume_NTFS_00000000-0000-0000-0000-000000000001$/20190315230135/160.cbrevision.2~5nOv_6K_GZVwAJNqmEZ                           RrmE4lMs_-91.meta
2020-03-03 18:13:19.361 7fb58bcfb6c0 20 obj_has_expired(): mtime=2019-03-16 00:53:58.0.940346s days=1 base_time=2020-03-03 00:00:00.000000 timediff=3.0496e+07 cmp=86400
2020-03-03 18:13:19.362 7fb58bcfb6c0 20 abort_multipart_upload: list_multipart_parts returned -2
2020-03-03 18:13:19.362 7fb58bcfb6c0  5 lifecycle: ERROR: abort_multipart_upload failed, ret=-2009, meta:_multipart_MBS-0fc78b70-efa6-49ef-bdd2-fd3a4b4f2c84/CBB_BIM-AUTOLOG/CBB_DiskImage/Disk_00000000-0000-0000-0000-000000000000/Volume_NTFS_00000000-0000-0000-0000-000000000001$/20190315230135/160.cbrevision.2~67RyQVXdhT-g3Jp1V88                           cNHCkv6ly_tt.meta
2020-03-03 18:13:19.362 7fb58bcfb6c0 20 obj_has_expired(): mtime=2019-03-16 00:18:28.0.7263s days=1 base_time=2020-03-03 00:00:00.000000 timediff=3.04981e+07 cmp=86400
2020-03-03 18:13:19.362 7fb58bcfb6c0 20 abort_multipart_upload: list_multipart_parts returned -2
2020-03-03 18:13:19.362 7fb58bcfb6c0  5 lifecycle: ERROR: abort_multipart_upload failed, ret=-2009, meta:_multipart_MBS-0fc78b70-efa6-49ef-bdd2-fd3a4b4f2c84/CBB_BIM-AUTOLOG/CBB_DiskImage/Disk_00000000-0000-0000-0000-000000000000/Volume_NTFS_00000000-0000-0000-0000-000000000001$/20190315230135/160.cbrevision.2~EJaUoXHzAJikdRspX1H                           bpopE1ZbdCih.meta
2020-03-03 18:13:19.362 7fb58bcfb6c0 20 obj_has_expired(): mtime=2019-03-16 01:41:18.0.929875s days=1 base_time=2020-03-03 00:00:00.000000 timediff=3.04931e+07 cmp=86400
2020-03-03 18:13:19.363 7fb58bcfb6c0 20 abort_multipart_upload: list_multipart_parts returned -2
2020-03-03 18:13:19.363 7fb58bcfb6c0  5 lifecycle: ERROR: abort_multipart_upload failed, ret=-2009, meta:_multipart_MBS-0fc78b70-efa6-49ef-bdd2-fd3a4b4f2c84/CBB_BIM-AUTOLOG/CBB_DiskImage/Disk_00000000-0000-0000-0000-000000000000/Volume_NTFS_00000000-0000-0000-0000-000000000001$/20190315230135/160.cbrevision.2~IZ9dHhPZHmJivZSDqvG                           kJILjn_tDFZP.meta

Lifecycle applied.


<?xml version="1.0" ?>
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
        <Rule>
                <ID>Incomplete Multipart Uploads</ID>
                <Prefix/>
                <Status>Enabled</Status>
                <AbortIncompleteMultipartUpload>
                        <DaysAfterInitiation>1</DaysAfterInitiation>
                </AbortIncompleteMultipartUpload>
        </Rule>
</LifecycleConfiguration>

Actions

Copy link

#14

Updated by dongdong tao about 4 years ago

@Manuel Rios

You have list_multipart_parts returned -2, which means your .meta object in non-ec pool should already be deleted.

Please note that this fix won't let you abort those multipart which already failed to abort before (cause the failed abort already deleted the .meta object).
For those old failed multipart abortion, you'll have to manually clear the them.

This fix will make sure those new partial completed multipart will abort successfully

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #43583

rgw: unable to abort multipart upload after the bucket got resharded

Updated by dongdong tao over 4 years ago

Updated by dongdong tao over 4 years ago

Updated by Casey Bodley over 4 years ago

Updated by dongdong tao over 4 years ago

Updated by J. Eric Ivancich over 4 years ago

Updated by J. Eric Ivancich over 4 years ago

Updated by J. Eric Ivancich over 4 years ago

Updated by Casey Bodley over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Nathan Cutler over 4 years ago

Updated by Casey Bodley about 4 years ago

Updated by Nathan Cutler about 4 years ago

Updated by Manuel Rios about 4 years ago

Updated by dongdong tao about 4 years ago