Bug #16767
closedRadosGW Multipart Cleanup Failure
100%
Description
My current setup is a Ceph Hammer cluster running 0.94.6. The rest of the cluster details are irrelevant to this issue.
I've stumbled upon an issue whereby RGW is not cleaning up properly after a multipart upload is completed (either abort or complete). If a client re-uploads a part during a multipart upload, ceph will store both the original and new part, but only the latter part will be valid when POSTing the CompleteMultipartUpload XML payload. When the multipart upload is completed (either through abort or complete), only the initial parts will be removed from the system. The remaining parts are orphaned and are not (easily) removable.
To reproduce:
First, create four 5MiB files with unique md5 sums:
dd if=/dev/urandom of=/tmp/part1.1 bs=1M count=5 dd if=/dev/urandom of=/tmp/part1.2 bs=1M count=5 dd if=/dev/urandom of=/tmp/part2.1 bs=1M count=5 dd if=/dev/urandom of=/tmp/part2.2 bs=1M count=5
Next, initiate a multipart upload:
s3curl --id test -- -X POST http://ceph.cluster/bucket/mpobject?uploads
Upload the parts:
s3curl --id test --put /tmp/part1.1 -- http://ceph.cluster/bucket/mpobject?partNumber=1&uploadId=2~whateverid s3curl --id test --put /tmp/part1.2 -- http://ceph.cluster/bucket/mpobject?partNumber=2&uploadId=2~whateverid s3curl --id test --put /tmp/part2.1 -- http://ceph.cluster/bucket/mpobject?partNumber=1&uploadId=2~whateverid s3curl --id test --put /tmp/part2.2 -- http://ceph.cluster/bucket/mpobject?partNumber=2&uploadId=2~whateverid
Now, let's take a look at what RGW says about the bucket:
radosgw-admin bucket stats --bucket=bucket | grep -A7 mptest | grep -v owner | grep -v instance "name": "mptest.2~o2LrKVtYqA_cwHAypOprHT-ANmTeH4S.1", "namespace": "multipart", "size": 5242880, "mtime": "2016-07-21 18:43:15.000000Z", "etag": "785dec7eeb68366cca5c19cec86c508b", -- "name": "mptest.2~o2LrKVtYqA_cwHAypOprHT-ANmTeH4S.2", "namespace": "multipart", "size": 5242880, "mtime": "2016-07-21 18:43:24.000000Z", "etag": "b11c15f456f17ba763d0fb900d22376c", -- "name": "mptest.2~o2LrKVtYqA_cwHAypOprHT-ANmTeH4S.meta", "namespace": "multipart", "size": 0, "mtime": "2016-07-21 18:43:00.000000Z", "etag": "", -- "name": "mptest.feXQAxbcmjR1WdN_-b-jj1BKcObJ3Q6.2", "namespace": "multipart", "size": 5242880, "mtime": "2016-07-21 18:43:39.000000Z", "etag": "2d26aa403bc759305d0ea61d29f17cd0", -- "name": "mptest.i0q6uZ-do4mYoW7z5z8JDAQitcGJ5No.1", "namespace": "multipart", "size": 5242880, "mtime": "2016-07-21 18:43:31.000000Z", "etag": "a9fdb9efe0722f6e61d5d4ff3dfe0e81",
So we now have a meta file that contains the upload id, the first two attempted parts containing the upload id in the name, and the two subsequent parts that do not contain the upload id in the name.
Now, let's list the available parts associated with the id:
./s3curl --id test -- http://ceph.cluster/bucket/mpobject?uploadId=2~whateverid | xmlstarlet fo <?xml version="1.0" encoding="UTF-8"?> <ListPartsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Bucket>bucket</Bucket> <Key>mptest</Key> <UploadId>2~whateverid</UploadId> ... <Owner> <ID>7e1af43925cbef79334d2da290d602d586d04d7dd9aeb970c95ab93c0641c1f4</ID> <DisplayName>t3os_test</DisplayName> </Owner> <Part> <LastModified>2016-07-21T18:43:31.000Z</LastModified> <PartNumber>1</PartNumber> <ETag>a9fdb9efe0722f6e61d5d4ff3dfe0e81</ETag> <Size>5242880</Size> </Part> <Part> <LastModified>2016-07-21T18:43:39.000Z</LastModified> <PartNumber>2</PartNumber> <ETag>2d26aa403bc759305d0ea61d29f17cd0</ETag> <Size>5242880</Size> </Part> </ListPartsResult>
We see here that the available parts are the last two uploaded. So far, so good.
Now, let's go ahead and complete this thing.
{builds valid CompeteMultipartUpload document} ./s3curl --id test --post mp.test -- http://ceph.cluster/bucket/mpobject?uploadId=2~whateverid
Great success! I can now download the object, and it shows to be the valid combination of the last two parts I uploaded.
Now, however, let's take a look at our bucket:
radosgw-admin bucket list --bucket=bucket | grep -A7 mptest | grep -v owner | grep -v instance "name": "mptest.feXQAxbcmjR1WdN_-b-jj1BKcObJ3Q6.2", "namespace": "multipart", "size": 5242880, "mtime": "2016-07-21 18:43:39.000000Z", "etag": "2d26aa403bc759305d0ea61d29f17cd0", -- "name": "mptest.i0q6uZ-do4mYoW7z5z8JDAQitcGJ5No.1", "namespace": "multipart", "size": 5242880, "mtime": "2016-07-21 18:43:31.000000Z", "etag": "a9fdb9efe0722f6e61d5d4ff3dfe0e81", -- "name": "mptest", "namespace": "", "size": 10485760, "mtime": "2016-07-21 18:52:23.000000Z", "etag": "39967388ccf40f9570e7f3154549e589-2",
Upon completing the request, only the two parts tagged with the upload id are removed from the system. If I list out the .rgw.buckets pool, I can confirm that all of the parts are still present:
rados -p .rgw.buckets ls | grep mptest default.7754.6__shadow_mptest.feXQAxbcmjR1WdN_-b-jj1BKcObJ3Q6.2_1 default.7754.6_mptest default.7754.6__multipart_mptest.feXQAxbcmjR1WdN_-b-jj1BKcObJ3Q6.2 default.7754.6__multipart_mptest.2~o2LrKVtYqA_cwHAypOprHT-ANmTeH4S.2 default.7754.6__shadow_mptest.2~o2LrKVtYqA_cwHAypOprHT-ANmTeH4S.2_1 default.7754.6__multipart_mptest.2~o2LrKVtYqA_cwHAypOprHT-ANmTeH4S.1 default.7754.6__shadow_mptest.i0q6uZ-do4mYoW7z5z8JDAQitcGJ5No.1_1 default.7754.6__shadow_mptest.2~o2LrKVtYqA_cwHAypOprHT-ANmTeH4S.1_1 default.7754.6__multipart_mptest.i0q6uZ-do4mYoW7z5z8JDAQitcGJ5No.1
Aborting the upload yields similar results, except in reverse. In the abort case, the files that contain the upload id in the name will be retained, but the other files will be properly removed.
For small multipart uploads like this, the additional space used is trivial. But in our actual cluster, we have clients that are uploading considerably larger files and are noticing that their bucket utilization is tens of TB larger than the sum of the objects they can list. The files are not removed by garbage collection, and are generally only removable through a very slow process of listing the omap contents of the bucket shards in .rgw.buckets.index and removing the omap keys that cannot be found.