Project

General

Profile

Bug #43583

rgw: unable to abort multipart upload after the bucket got resharded

Added by dongdong tao almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Target version:
-
% Done:

0%

Source:
Tags:
reshard multipart
Backport:
nautilus,mimic
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

There is a bug during the resharding for those multipart entries.
For all the multipart entries, the hash source should be the object name so that all those entries can still be
distributed to one same bucket index shard object.
Right now the code just calculate the shard id based on each entry's name, which is wrong

This can cause the bucket not able to abort the multipart upload and leave the stale multiple entries behind.


Related issues

Related to rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknown Triaged
Copied to rgw - Backport #43846: nautilus: rgw: unable to abort multipart upload after the bucket got resharded Resolved
Copied to rgw - Backport #43847: mimic: rgw: unable to abort multipart upload after the bucket got resharded Resolved

History

#1 Updated by dongdong tao almost 2 years ago

I will open a pull request soon

#3 Updated by Casey Bodley almost 2 years ago

  • Priority changed from Normal to High
  • Tags set to reshard multipart
  • Backport set to nautilus

#4 Updated by dongdong tao almost 2 years ago

@Casey, will this also be backported to luminous ?
May i know is there any plan for 12.2.13 ?

#5 Updated by J. Eric Ivancich almost 2 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 32617

#6 Updated by J. Eric Ivancich almost 2 years ago

  • Backport changed from nautilus to nautilus,mimic

#7 Updated by J. Eric Ivancich almost 2 years ago

  • Assignee set to J. Eric Ivancich

#8 Updated by Casey Bodley almost 2 years ago

  • Status changed from Fix Under Review to Pending Backport

#9 Updated by Nathan Cutler almost 2 years ago

  • Copied to Backport #43846: nautilus: rgw: unable to abort multipart upload after the bucket got resharded added

#10 Updated by Nathan Cutler almost 2 years ago

  • Copied to Backport #43847: mimic: rgw: unable to abort multipart upload after the bucket got resharded added

#11 Updated by Casey Bodley almost 2 years ago

  • Related to Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknown added

#12 Updated by Nathan Cutler almost 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

#13 Updated by Manuel Rios over 1 year ago

We updated today the cluster to 14.2.8 that apply this backport.
Now LC show more information but also this new errors and continue unable to abort.

2020-03-03 18:13:19.361 7fb58bcfb6c0  5 lifecycle: ERROR: abort_multipart_upload failed, ret=-2009, meta:_multipart_MBS-0fc78b70-efa6-49ef-bdd2-fd3a4b4f2c84/CBB_BIM-AUTOLOG/CBB_DiskImage/Disk_00000000-0000-0000-0000-000000000000/Volume_NTFS_00000000-0000-0000-0000-000000000001$/20190315230135/160.cbrevision.2~5nOv_6K_GZVwAJNqmEZ                           RrmE4lMs_-91.meta
2020-03-03 18:13:19.361 7fb58bcfb6c0 20 obj_has_expired(): mtime=2019-03-16 00:53:58.0.940346s days=1 base_time=2020-03-03 00:00:00.000000 timediff=3.0496e+07 cmp=86400
2020-03-03 18:13:19.362 7fb58bcfb6c0 20 abort_multipart_upload: list_multipart_parts returned -2
2020-03-03 18:13:19.362 7fb58bcfb6c0  5 lifecycle: ERROR: abort_multipart_upload failed, ret=-2009, meta:_multipart_MBS-0fc78b70-efa6-49ef-bdd2-fd3a4b4f2c84/CBB_BIM-AUTOLOG/CBB_DiskImage/Disk_00000000-0000-0000-0000-000000000000/Volume_NTFS_00000000-0000-0000-0000-000000000001$/20190315230135/160.cbrevision.2~67RyQVXdhT-g3Jp1V88                           cNHCkv6ly_tt.meta
2020-03-03 18:13:19.362 7fb58bcfb6c0 20 obj_has_expired(): mtime=2019-03-16 00:18:28.0.7263s days=1 base_time=2020-03-03 00:00:00.000000 timediff=3.04981e+07 cmp=86400
2020-03-03 18:13:19.362 7fb58bcfb6c0 20 abort_multipart_upload: list_multipart_parts returned -2
2020-03-03 18:13:19.362 7fb58bcfb6c0  5 lifecycle: ERROR: abort_multipart_upload failed, ret=-2009, meta:_multipart_MBS-0fc78b70-efa6-49ef-bdd2-fd3a4b4f2c84/CBB_BIM-AUTOLOG/CBB_DiskImage/Disk_00000000-0000-0000-0000-000000000000/Volume_NTFS_00000000-0000-0000-0000-000000000001$/20190315230135/160.cbrevision.2~EJaUoXHzAJikdRspX1H                           bpopE1ZbdCih.meta
2020-03-03 18:13:19.362 7fb58bcfb6c0 20 obj_has_expired(): mtime=2019-03-16 01:41:18.0.929875s days=1 base_time=2020-03-03 00:00:00.000000 timediff=3.04931e+07 cmp=86400
2020-03-03 18:13:19.363 7fb58bcfb6c0 20 abort_multipart_upload: list_multipart_parts returned -2
2020-03-03 18:13:19.363 7fb58bcfb6c0  5 lifecycle: ERROR: abort_multipart_upload failed, ret=-2009, meta:_multipart_MBS-0fc78b70-efa6-49ef-bdd2-fd3a4b4f2c84/CBB_BIM-AUTOLOG/CBB_DiskImage/Disk_00000000-0000-0000-0000-000000000000/Volume_NTFS_00000000-0000-0000-0000-000000000001$/20190315230135/160.cbrevision.2~IZ9dHhPZHmJivZSDqvG                           kJILjn_tDFZP.meta

Lifecycle applied.


<?xml version="1.0" ?>
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
        <Rule>
                <ID>Incomplete Multipart Uploads</ID>
                <Prefix/>
                <Status>Enabled</Status>
                <AbortIncompleteMultipartUpload>
                        <DaysAfterInitiation>1</DaysAfterInitiation>
                </AbortIncompleteMultipartUpload>
        </Rule>
</LifecycleConfiguration>

#14 Updated by dongdong tao over 1 year ago

@Manuel Rios

You have list_multipart_parts returned -2, which means your .meta object in non-ec pool should already be deleted.

Please note that this fix won't let you abort those multipart which already failed to abort before (cause the failed abort already deleted the .meta object).
For those old failed multipart abortion, you'll have to manually clear the them.

This fix will make sure those new partial completed multipart will abort successfully

Also available in: Atom PDF