Project

General

Profile

Actions

Bug #43583

closed

rgw: unable to abort multipart upload after the bucket got resharded

Added by dongdong tao over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
High
Target version:
-
% Done:

0%

Source:
Tags:
reshard multipart
Backport:
nautilus,mimic
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

There is a bug during the resharding for those multipart entries.
For all the multipart entries, the hash source should be the object name so that all those entries can still be
distributed to one same bucket index shard object.
Right now the code just calculate the shard id based on each entry's name, which is wrong

This can cause the bucket not able to abort the multipart upload and leave the stale multiple entries behind.


Related issues 3 (1 open2 closed)

Related to rgw - Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: UnknownTriaged

Actions
Copied to rgw - Backport #43846: nautilus: rgw: unable to abort multipart upload after the bucket got reshardedResolvedNathan CutlerActions
Copied to rgw - Backport #43847: mimic: rgw: unable to abort multipart upload after the bucket got reshardedResolvedNathan CutlerActions
Actions #1

Updated by dongdong tao over 4 years ago

I will open a pull request soon

Actions #3

Updated by Casey Bodley over 4 years ago

  • Priority changed from Normal to High
  • Tags set to reshard multipart
  • Backport set to nautilus
Actions #4

Updated by dongdong tao over 4 years ago

@Casey, will this also be backported to luminous ?
May i know is there any plan for 12.2.13 ?

Actions #5

Updated by J. Eric Ivancich over 4 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 32617
Actions #6

Updated by J. Eric Ivancich over 4 years ago

  • Backport changed from nautilus to nautilus,mimic
Actions #7

Updated by J. Eric Ivancich about 4 years ago

  • Assignee set to J. Eric Ivancich
Actions #8

Updated by Casey Bodley about 4 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #9

Updated by Nathan Cutler about 4 years ago

  • Copied to Backport #43846: nautilus: rgw: unable to abort multipart upload after the bucket got resharded added
Actions #10

Updated by Nathan Cutler about 4 years ago

  • Copied to Backport #43847: mimic: rgw: unable to abort multipart upload after the bucket got resharded added
Actions #11

Updated by Casey Bodley about 4 years ago

  • Related to Bug #43756: An error occurred (NoSuchUpload) when calling the AbortMultipartUpload operation: Unknown added
Actions #12

Updated by Nathan Cutler about 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions #13

Updated by Manuel Rios about 4 years ago

We updated today the cluster to 14.2.8 that apply this backport.
Now LC show more information but also this new errors and continue unable to abort.

2020-03-03 18:13:19.361 7fb58bcfb6c0  5 lifecycle: ERROR: abort_multipart_upload failed, ret=-2009, meta:_multipart_MBS-0fc78b70-efa6-49ef-bdd2-fd3a4b4f2c84/CBB_BIM-AUTOLOG/CBB_DiskImage/Disk_00000000-0000-0000-0000-000000000000/Volume_NTFS_00000000-0000-0000-0000-000000000001$/20190315230135/160.cbrevision.2~5nOv_6K_GZVwAJNqmEZ                           RrmE4lMs_-91.meta
2020-03-03 18:13:19.361 7fb58bcfb6c0 20 obj_has_expired(): mtime=2019-03-16 00:53:58.0.940346s days=1 base_time=2020-03-03 00:00:00.000000 timediff=3.0496e+07 cmp=86400
2020-03-03 18:13:19.362 7fb58bcfb6c0 20 abort_multipart_upload: list_multipart_parts returned -2
2020-03-03 18:13:19.362 7fb58bcfb6c0  5 lifecycle: ERROR: abort_multipart_upload failed, ret=-2009, meta:_multipart_MBS-0fc78b70-efa6-49ef-bdd2-fd3a4b4f2c84/CBB_BIM-AUTOLOG/CBB_DiskImage/Disk_00000000-0000-0000-0000-000000000000/Volume_NTFS_00000000-0000-0000-0000-000000000001$/20190315230135/160.cbrevision.2~67RyQVXdhT-g3Jp1V88                           cNHCkv6ly_tt.meta
2020-03-03 18:13:19.362 7fb58bcfb6c0 20 obj_has_expired(): mtime=2019-03-16 00:18:28.0.7263s days=1 base_time=2020-03-03 00:00:00.000000 timediff=3.04981e+07 cmp=86400
2020-03-03 18:13:19.362 7fb58bcfb6c0 20 abort_multipart_upload: list_multipart_parts returned -2
2020-03-03 18:13:19.362 7fb58bcfb6c0  5 lifecycle: ERROR: abort_multipart_upload failed, ret=-2009, meta:_multipart_MBS-0fc78b70-efa6-49ef-bdd2-fd3a4b4f2c84/CBB_BIM-AUTOLOG/CBB_DiskImage/Disk_00000000-0000-0000-0000-000000000000/Volume_NTFS_00000000-0000-0000-0000-000000000001$/20190315230135/160.cbrevision.2~EJaUoXHzAJikdRspX1H                           bpopE1ZbdCih.meta
2020-03-03 18:13:19.362 7fb58bcfb6c0 20 obj_has_expired(): mtime=2019-03-16 01:41:18.0.929875s days=1 base_time=2020-03-03 00:00:00.000000 timediff=3.04931e+07 cmp=86400
2020-03-03 18:13:19.363 7fb58bcfb6c0 20 abort_multipart_upload: list_multipart_parts returned -2
2020-03-03 18:13:19.363 7fb58bcfb6c0  5 lifecycle: ERROR: abort_multipart_upload failed, ret=-2009, meta:_multipart_MBS-0fc78b70-efa6-49ef-bdd2-fd3a4b4f2c84/CBB_BIM-AUTOLOG/CBB_DiskImage/Disk_00000000-0000-0000-0000-000000000000/Volume_NTFS_00000000-0000-0000-0000-000000000001$/20190315230135/160.cbrevision.2~IZ9dHhPZHmJivZSDqvG                           kJILjn_tDFZP.meta

Lifecycle applied.


<?xml version="1.0" ?>
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
        <Rule>
                <ID>Incomplete Multipart Uploads</ID>
                <Prefix/>
                <Status>Enabled</Status>
                <AbortIncompleteMultipartUpload>
                        <DaysAfterInitiation>1</DaysAfterInitiation>
                </AbortIncompleteMultipartUpload>
        </Rule>
</LifecycleConfiguration>

Actions #14

Updated by dongdong tao about 4 years ago

@Manuel Rios

You have list_multipart_parts returned -2, which means your .meta object in non-ec pool should already be deleted.

Please note that this fix won't let you abort those multipart which already failed to abort before (cause the failed abort already deleted the .meta object).
For those old failed multipart abortion, you'll have to manually clear the them.

This fix will make sure those new partial completed multipart will abort successfully

Actions

Also available in: Atom PDF