Project

General

Profile

Actions

Bug #16767

closed

RadosGW Multipart Cleanup Failure

Added by Brian Felton almost 8 years ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
% Done:

100%

Source:
other
Tags:
rgw multipart backport_processed
Backport:
quincy pacific reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

My current setup is a Ceph Hammer cluster running 0.94.6. The rest of the cluster details are irrelevant to this issue.

I've stumbled upon an issue whereby RGW is not cleaning up properly after a multipart upload is completed (either abort or complete). If a client re-uploads a part during a multipart upload, ceph will store both the original and new part, but only the latter part will be valid when POSTing the CompleteMultipartUpload XML payload. When the multipart upload is completed (either through abort or complete), only the initial parts will be removed from the system. The remaining parts are orphaned and are not (easily) removable.

To reproduce:

First, create four 5MiB files with unique md5 sums:

dd if=/dev/urandom of=/tmp/part1.1 bs=1M count=5
dd if=/dev/urandom of=/tmp/part1.2 bs=1M count=5
dd if=/dev/urandom of=/tmp/part2.1 bs=1M count=5
dd if=/dev/urandom of=/tmp/part2.2 bs=1M count=5

Next, initiate a multipart upload:

s3curl --id test -- -X POST http://ceph.cluster/bucket/mpobject?uploads

Upload the parts:

s3curl --id test --put /tmp/part1.1 -- http://ceph.cluster/bucket/mpobject?partNumber=1&uploadId=2~whateverid
s3curl --id test --put /tmp/part1.2 -- http://ceph.cluster/bucket/mpobject?partNumber=2&uploadId=2~whateverid
s3curl --id test --put /tmp/part2.1 -- http://ceph.cluster/bucket/mpobject?partNumber=1&uploadId=2~whateverid
s3curl --id test --put /tmp/part2.2 -- http://ceph.cluster/bucket/mpobject?partNumber=2&uploadId=2~whateverid

Now, let's take a look at what RGW says about the bucket:

radosgw-admin bucket stats --bucket=bucket | grep -A7 mptest | grep -v owner | grep -v instance

        "name": "mptest.2~o2LrKVtYqA_cwHAypOprHT-ANmTeH4S.1",
        "namespace": "multipart",
        "size": 5242880,
        "mtime": "2016-07-21 18:43:15.000000Z",
        "etag": "785dec7eeb68366cca5c19cec86c508b",
--
        "name": "mptest.2~o2LrKVtYqA_cwHAypOprHT-ANmTeH4S.2",
        "namespace": "multipart",
        "size": 5242880,
        "mtime": "2016-07-21 18:43:24.000000Z",
        "etag": "b11c15f456f17ba763d0fb900d22376c",
--
        "name": "mptest.2~o2LrKVtYqA_cwHAypOprHT-ANmTeH4S.meta",
        "namespace": "multipart",
        "size": 0,
        "mtime": "2016-07-21 18:43:00.000000Z",
        "etag": "",
--
        "name": "mptest.feXQAxbcmjR1WdN_-b-jj1BKcObJ3Q6.2",
        "namespace": "multipart",
        "size": 5242880,
        "mtime": "2016-07-21 18:43:39.000000Z",
        "etag": "2d26aa403bc759305d0ea61d29f17cd0",
--
        "name": "mptest.i0q6uZ-do4mYoW7z5z8JDAQitcGJ5No.1",
        "namespace": "multipart",
        "size": 5242880,
        "mtime": "2016-07-21 18:43:31.000000Z",
        "etag": "a9fdb9efe0722f6e61d5d4ff3dfe0e81",

So we now have a meta file that contains the upload id, the first two attempted parts containing the upload id in the name, and the two subsequent parts that do not contain the upload id in the name.

Now, let's list the available parts associated with the id:

./s3curl --id test -- http://ceph.cluster/bucket/mpobject?uploadId=2~whateverid | xmlstarlet fo

<?xml version="1.0" encoding="UTF-8"?>
<ListPartsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Bucket>bucket</Bucket>
  <Key>mptest</Key>
  <UploadId>2~whateverid</UploadId>
...
  <Owner>
    <ID>7e1af43925cbef79334d2da290d602d586d04d7dd9aeb970c95ab93c0641c1f4</ID>
    <DisplayName>t3os_test</DisplayName>
  </Owner>
  <Part>
    <LastModified>2016-07-21T18:43:31.000Z</LastModified>
    <PartNumber>1</PartNumber>
    <ETag>a9fdb9efe0722f6e61d5d4ff3dfe0e81</ETag>
    <Size>5242880</Size>
  </Part>
  <Part>
    <LastModified>2016-07-21T18:43:39.000Z</LastModified>
    <PartNumber>2</PartNumber>
    <ETag>2d26aa403bc759305d0ea61d29f17cd0</ETag>
    <Size>5242880</Size>
  </Part>
</ListPartsResult>

We see here that the available parts are the last two uploaded. So far, so good.

Now, let's go ahead and complete this thing.

{builds valid CompeteMultipartUpload document}
./s3curl --id test --post mp.test -- http://ceph.cluster/bucket/mpobject?uploadId=2~whateverid

Great success! I can now download the object, and it shows to be the valid combination of the last two parts I uploaded.

Now, however, let's take a look at our bucket:

radosgw-admin bucket list --bucket=bucket | grep -A7 mptest | grep -v owner | grep -v instance
        "name": "mptest.feXQAxbcmjR1WdN_-b-jj1BKcObJ3Q6.2",
        "namespace": "multipart",
        "size": 5242880,
        "mtime": "2016-07-21 18:43:39.000000Z",
        "etag": "2d26aa403bc759305d0ea61d29f17cd0",
--
        "name": "mptest.i0q6uZ-do4mYoW7z5z8JDAQitcGJ5No.1",
        "namespace": "multipart",
        "size": 5242880,
        "mtime": "2016-07-21 18:43:31.000000Z",
        "etag": "a9fdb9efe0722f6e61d5d4ff3dfe0e81",
--
        "name": "mptest",
        "namespace": "",
        "size": 10485760,
        "mtime": "2016-07-21 18:52:23.000000Z",
        "etag": "39967388ccf40f9570e7f3154549e589-2",

Upon completing the request, only the two parts tagged with the upload id are removed from the system. If I list out the .rgw.buckets pool, I can confirm that all of the parts are still present:

rados -p .rgw.buckets ls | grep mptest

default.7754.6__shadow_mptest.feXQAxbcmjR1WdN_-b-jj1BKcObJ3Q6.2_1
default.7754.6_mptest
default.7754.6__multipart_mptest.feXQAxbcmjR1WdN_-b-jj1BKcObJ3Q6.2
default.7754.6__multipart_mptest.2~o2LrKVtYqA_cwHAypOprHT-ANmTeH4S.2
default.7754.6__shadow_mptest.2~o2LrKVtYqA_cwHAypOprHT-ANmTeH4S.2_1
default.7754.6__multipart_mptest.2~o2LrKVtYqA_cwHAypOprHT-ANmTeH4S.1
default.7754.6__shadow_mptest.i0q6uZ-do4mYoW7z5z8JDAQitcGJ5No.1_1
default.7754.6__shadow_mptest.2~o2LrKVtYqA_cwHAypOprHT-ANmTeH4S.1_1
default.7754.6__multipart_mptest.i0q6uZ-do4mYoW7z5z8JDAQitcGJ5No.1

Aborting the upload yields similar results, except in reverse. In the abort case, the files that contain the upload id in the name will be retained, but the other files will be properly removed.

For small multipart uploads like this, the additional space used is trivial. But in our actual cluster, we have clients that are uploading considerably larger files and are noticing that their bucket utilization is tens of TB larger than the sum of the objects they can list. The files are not removed by garbage collection, and are generally only removable through a very slow process of listing the omap contents of the bucket shards in .rgw.buckets.index and removing the omap keys that cannot be found.


Related issues 7 (3 open4 closed)

Related to rgw - Bug #44660: Multipart re-uploads cause orphan dataPending Backport

Actions
Related to rgw - Bug #58369: When uploading parts in multipart upload, use the "AbortMultipartUpload" interface to end the upload, and there will be data that cannot be cleanedNewJ. Eric Ivancich

Actions
Related to rgw-testing - Bug #58780: scan for orphaned rados objects and index entries in rgw suitePending BackportJ. Eric Ivancich

Actions
Has duplicate rgw - Bug #57942: rgw leaks rados objects when a part is submitted multiple times in a multipart uploadDuplicate

Actions
Copied to rgw - Backport #59064: reef: RadosGW Multipart Cleanup FailureResolvedMykola GolubActions
Copied to rgw - Backport #59065: quincy: RadosGW Multipart Cleanup FailureResolvedMykola GolubActions
Copied to rgw - Backport #59066: pacific: RadosGW Multipart Cleanup FailureRejectedActions
Actions #1

Updated by Brian Felton over 7 years ago

With apologies for pestering, is there anyone looking into this? This bug affects the viability of Ceph as a backing store for a commercial product, as it can't be relied upon as a canonical source of truth for reporting storage utilization.

Also, what I reported earlier about being able to cleaning .rgw.buckets.index as a workaround was premature. While I can manually remove entries from .rgw.buckets and clean up orphans in .rgw.buckets.index, I've not found the secret sauce for actually getting the bucket's utilization to reflect the changes.

Actions #2

Updated by Samuel Just over 7 years ago

  • Project changed from Ceph to rgw
Actions #3

Updated by William Schroeder over 6 years ago

https://github.com/ceph/ceph/pull/17349 addresses the cleanup aspect of this bug; we wrote a tool that deletes the leaked multipart objects. The code is awaiting review, in case our change has unintended side-effects.

Actions #4

Updated by George Mihaiescu over 6 years ago

It would be so great if this tool would be reviewed and back-ported to Jewel.

Actions #5

Updated by Orit Wasserman over 6 years ago

  • Assignee set to Orit Wasserman
Actions #6

Updated by Chris Jones about 4 years ago

This is still an issue even in luminous 12.2.12 and even in the nautilus versions we tested. Orphan find is impractical to run on large clusters. Has there been any work to address this issue?

Actions #7

Updated by Vicki Good over 3 years ago

I've encountered this bug in Ceph 14 and 15 and it's a pretty big problem for us for the same reason Brian Felton mentioned. It affects our storage utilization reporting.

I have been unable to manually remove these left-behind objects from the pools, but running radosgw-admin bucket check --fix --check-objects does clean them up for buckets that are not sharded. That command does not work on sharded buckets. Even if it did work for all buckets, we would have to run it constantly for all buckets--not at all practical.

Is is possible to increase the priority of this bug?

Actions #8

Updated by Casey Bodley about 3 years ago

  • Related to Bug #44660: Multipart re-uploads cause orphan data added
Actions #9

Updated by Rok Jaklic over 1 year ago

Vicki Good wrote:

I've encountered this bug in Ceph 14 and 15 and it's a pretty big problem for us for the same reason Brian Felton mentioned. It affects our storage utilization reporting.

I have been unable to manually remove these left-behind objects from the pools, but running radosgw-admin bucket check --fix --check-objects does clean them up for buckets that are not sharded. That command does not work on sharded buckets. Even if it did work for all buckets, we would have to run it constantly for all buckets--not at all practical.

Is is possible to increase the priority of this bug?

We've encountered this bug also in Ceph 16.

It is pretty big problem for us also, since we do provisioning for users based on size_actual.

Actions #10

Updated by Matt Benjamin over 1 year ago

  • Status changed from New to In Progress
  • Assignee changed from Orit Wasserman to Matt Benjamin
Actions #11

Updated by Casey Bodley over 1 year ago

  • Pull request ID set to 37260
Actions #12

Updated by Casey Bodley over 1 year ago

  • Has duplicate Bug #57942: rgw leaks rados objects when a part is submitted multiple times in a multipart upload added
Actions #13

Updated by Aleksandr Rudenko over 1 year ago

It is very big problem for us.

We have a lot of big buckets with orphaned parts which use hundreds TBs of space.

And second problem that bucket check can't fix it if we have sharded bucket.
We have to reshard big buckets to 0 shards and fix it. But we can't reshard very big buckets (200-500m objects) to 0 shards because it can lead to another problems with osd crash...
and fix will eat a lot of memory..

Actions #14

Updated by Matt Benjamin over 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to quincy
Actions #15

Updated by Casey Bodley over 1 year ago

  • Related to Bug #58369: When uploading parts in multipart upload, use the "AbortMultipartUpload" interface to end the upload, and there will be data that cannot be cleaned added
Actions #16

Updated by J. Eric Ivancich about 1 year ago

  • Related to Bug #58780: scan for orphaned rados objects and index entries in rgw suite added
Actions #17

Updated by Konstantin Shalygin about 1 year ago

  • Status changed from Fix Under Review to Pending Backport
  • Assignee deleted (Matt Benjamin)
  • Target version set to v18.0.0
  • % Done changed from 0 to 80
  • Backport changed from quincy to quincy pacific reef
  • Pull request ID changed from 37260 to 49709
Actions #18

Updated by Backport Bot about 1 year ago

Actions #19

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59065: quincy: RadosGW Multipart Cleanup Failure added
Actions #20

Updated by Backport Bot about 1 year ago

  • Copied to Backport #59066: pacific: RadosGW Multipart Cleanup Failure added
Actions #21

Updated by Backport Bot about 1 year ago

  • Tags changed from rgw multipart to rgw multipart backport_processed
Actions #22

Updated by Konstantin Shalygin 4 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 80 to 100
Actions

Also available in: Atom PDF