Project

General

Profile

Actions

Bug #11749

closed

rgw: rados objects wronly deleted

Added by xingyi wu almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

60%

Source:
Community (dev)
Tags:
Backport:
hammer
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
rgw
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When testing rados gateway(giant v0.87), I found two bugs which would be dangerous enough to cause data corruption. The first one is fixed by Yehuda on https://github.com/ceph/ceph/pull/4661, in such condition, the first stripe of a part will be lost. Then I tested with upstream code, and found is still exists. Here is the details of the second bug.
This bug only happened with multipart is enabled. When uploading object with multiple parts, a part which is not completely uploaded will be destroyed by callling dispose_processor. This would occasionally cause race conditon: the first upload would be possible to delete objects belong to the second upload, which would finally cause data corruption.
I set multipart size to 64MB and test it with s3cmd, actually I reproduced it with cyberduck, too. You can reproduce it with the following script:

dd if=/dev/zero of=BREAKDOWN bs=65M count=1
originalMD5=`md5sum ./BREAKDOWN | awk '{print $1}'`
s3cmd put $FILENAME s3://BREAKDOWN/$FILENAME &
sleep 2
kill -9 `ps aux | grep "s3cmd put" | grep -v grep | awk '{print $2}'`
s3cmd put $FILENAME s3://BREAKDOWN/$FILENAME
s3cmd get s3://BREAKDOWN/$FILENAME downloadedfile --force
downloadMD5=`md5sum ./downloadedfile | awk '{print $1}'`
if [[ "$originalMD5" == "$downloadMD5" ]] ;then
echo "bad MD5"
exit -1
fi

You can hit "bad MD5" bug after run enough times, when list the rados objects by "rados list -p .rgw.buckets | grep BREAKDOWN", you will find some rados objects would have already been deleted. Here is a sample:

[root@cephdev141 src]$./rados ls -p .rgw.buckets | grep 20150523164305 | sort
default.54105.4_20150523164305
default.54105.4__multipart_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1
default.54105.4__multipart_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.2
default.54105.4__shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_1
default.54105.4__shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_10
default.54105.4__shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_11
default.54105.4__shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_12
default.54105.4__shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_13
default.54105.4__shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_14
default.54105.4__shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_15
default.54105.4__shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_2
default.54105.4__shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_5
default.54105.4__shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_6
default.54105.4__shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_7
default.54105.4__shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_8
default.54105.4__shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_9

In this sample, default.54105.4_shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_3_ and default.54105.4_shadow_20150523164305.2~JeapILhRaTUmiiLjBFOdYRiaBgOIUVo.1_4_ were wrongly deleted.


Related issues 1 (0 open1 closed)

Copied to rgw - Backport #12099: rgw: rados objects wronly deletedResolvedAbhishek Lekshmanan05/23/201505/23/2015Actions
Actions #1

Updated by Kefu Chai almost 9 years ago

  • Status changed from New to Fix Under Review
  • Source changed from Development to Community (dev)
Actions #2

Updated by Yehuda Sadeh almost 9 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to hammer

Merged in commit:69989ffa3cabe209404504edd24b1d2a53e33e15.
backporting to firefly will also require picking up fix for #10311.

Actions #4

Updated by Gleb Borisov almost 9 years ago

Is there any way to find objects affected by this issue in bucket?

Actions #5

Updated by Loïc Dachary over 8 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF