Bug #20234
closedrgw: shadow objects are sometimes not removed
0%
Description
Testing on master, after copying an object, then deleting the copy and finally deleting the original object, the associated shadow objects to not get deleted because there is a mismatch in the tags involved. Here I have made a small patch to add the info.tag
to the log message:
out/rgw.0.log:2017-06-09 12:01:44.515851 7f171e7b2700 0 gc::process: removing default.rgw.buckets.data:3c50c7ad-d593-4c0a-b72b-922bc3c3aac3.4116.1__shadow_.TFa_fTQbzhmy_wEDOahQFDLPjlQwTUJ_1 tag: _4HrVnrN4fE7XtDh2Wc8mxc8-t7RKAeh out/rgw.0.log:2017-06-09 12:05:44.998398 7f171e7b2700 0 gc::process: removing default.rgw.buckets.data:3c50c7ad-d593-4c0a-b72b-922bc3c3aac3.4116.1__shadow_.TFa_fTQbzhmy_wEDOahQFDLPjlQwTUJ_1 tag: 3c50c7ad-d593-4c0a-b72b-922bc3c3aac3.4116.5
While the refcount xattr shows:
$ bin/rados -p default.rgw.buckets.data getxattr 3c50c7ad-d593-4c0a-b72b-922bc3c3aac3.4116.1__shadow_.TFa_fTQbzhmy_wEDOahQFDLPjlQwTUJ_1 refcount|od -x -a 0000000 0101 0034 0000 0001 0000 002b 0000 6333 soh soh 4 nul nul nul soh nul nul nul + nul nul nul 3 c 0000020 3035 3763 6461 642d 3935 2d33 6334 6130 5 0 c 7 a d - d 5 9 3 - 4 c 0 a 0000040 622d 3237 2d62 3239 6232 3363 3363 6161 - b 7 2 b - 9 2 2 b c 3 c 3 a a 0000060 3363 342e 3131 2e36 0137 c 3 . 4 1 1 6 . 7 soh 0000072
So 3c50c7ad-d593-4c0a-b72b-922bc3c3aac3.4116.7
is the tag that is written into the refcount for the copied object, but the tag stored with the object and being passed to the refcount_put is this random string instead: _4HrVnrN4fE7XtDh2Wc8mxc8-t7RKAeh
Updated by Jens Harbott almost 7 years ago
Verified that this also affects Jewel. But before committing a fix, better make sure that http://tracker.ceph.com/issues/20107 is fixed, too. Getting clogged up with undeleted object pieces is still better than having data loss.
Updated by Jens Harbott almost 7 years ago
Some more data:
This only seems to affect the S3 API, objects copied via the Swift API do not seem to be affected.
The issue also seems to be present in current Hammer.
If you want to reproduce it, lower the gc timers so you only need to wait two minutes instead of two hours:
rgw gc obj min wait = 120 rgw gc processor period = 120
Updated by Jens Harbott almost 7 years ago
More debugging shows that the pure copy operation produces the correct object, it is the put_acl call, which s3cmd does afterwards in order to preserve the ACL, which destroys the tag. Running this python code allows to show both cases:
import boto
access_key = 'string'
secret_key = 'string'
conn = boto.connect_s3(
aws_access_key_id = access_key,
aws_secret_access_key = secret_key,
host = 'localhost', port=8000,
is_secure=False
)
bucket = conn.create_bucket('bla')
key = bucket.new_key('test1')
key.set_contents_from_string('Hello World!\n'*400000)
key2 = bucket.copy_key('test02', 'bla', 'test1')
key3 = bucket.copy_key('test03', 'bla', 'test1', preserve_acl=True)
Object test02
afterwards still has the correct tag mentioned in the refcount
, while the tag for object test03
gets overwritten with a random string.
Updated by Jens Harbott almost 7 years ago
The cat to set_attrs still contains the correct parameters, but what gets written to the OSD contains the "bad" idtag
in addition:
2017-06-16 06:17:45.429337 7fb1d7fff700 15 Starting set_attrs {user.rgw.acl=buffer::list(len=149, buffer::ptr(0~149 0x7fb1c4031000 in raw 0x7fb1c4031000 len 4096 nref 3) ),user.rgw.content_type=buffer::list(len=25, buffer::ptr(202~25 0x7fb1dc01104a in raw 0x7fb1dc010f80 len 1116 nref 18) ),user.rgw.etag=buffer::list(len=33, buffer::ptr(248~33 0x7fb1dc011078 in raw 0x7fb1dc010f80 len 1116 nref 18) ),user.rgw.idtag=buffer::list(len=16, buffer::ptr(303~16 0x7fb1dc0110af in raw 0x7fb1dc010f80 len 1116 nref 18) ),user.rgw.manifest=buffer::list(len=378, buffer::ptr(344~378 0x7fb1dc0110d8 in raw 0x7fb1dc010f80 len 1116 nref 18) ),user.rgw.x-amz-content-sha256=buffer::list(len=65, buffer::ptr(759~65 0x7fb1dc011277 in raw 0x7fb1dc010f80 len 1116 nref 18) ),user.rgw.x-amz-date=buffer::list(len=32, buffer::ptr(851~32 0x7fb1dc0112d3 in raw 0x7fb1dc010f80 len 1116 nref 18) ),user.rgw.x-amz-meta-s3cmd-attrs=buffer::list(len=133, buffer::ptr(922~133 0x7fb1dc01131a in raw 0x7fb1dc010f80 len 1116 nref 18) ),user.rgw.x-amz-storage-class=buffer::list(len=9, buffer::ptr(1091~9 0x7fb1dc0113c3 in raw 0x7fb1dc010f80 len 1116 nref 18) )}
Using this patched code to generate the output:
diff --git a/src/rgw/rgw_op.cc b/src/rgw/rgw_op.cc
index de580e6..734d085 100644
--- a/src/rgw/rgw_op.cc
+++ b/src/rgw/rgw_op.cc
@@ -2624,6 +2624,7 @@ void RGWPutACLs::execute()
attrs[RGW_ATTR_ACL] = bl;
if (!s->object.empty()) {
+ ldout(s->cct, 15) << "Starting set_attrs " << attrs << dendl;
ret = store->set_attrs(s->obj_ctx, obj, attrs, NULL, ptracker);
} else {
ret = rgw_bucket_set_attrs(store, s->bucket_info, attrs, NULL, ptracker);
len=16
matches the correct original idtag
, while len=33
is the random one. This is the osd_op that is generated, wrapped and tagged for clarity:
2017-06-16 06:17:45.430857 7fb1d7fff700 1 -- 10.70.11.23:0/699179395 --> 10.70.11.23:6800/26370 -- osd_op(client.14121.0:148 default.14103.5_copy2 [ cmpxattr user.rgw.idtag (16) op 1 mode 1, setxattr user.rgw.acl (149),setxattr user.rgw.content_type (25),setxattr user.rgw.etag (33), setxattr user.rgw.idtag (16), <== correct tag setxattr user.rgw.manifest (378),setxattr user.rgw.x-amz-content-sha256 (65),setxattr user.rgw.x-amz-date (32),setxattr user.rgw.x-amz-meta-s3cmd-attrs (133),setxattr user.rgw.x-amz-storage-class (9), setxattr user.rgw.idtag (33) <== overwritten with random tag here ] 9.bc2509ae ondisk+write+known_if_redirected e32) v5 -- ?+0 0x7fb1c4038010 con 0x2d9c7b0
I'd also like to assign myself to this bug, but don't seem to have permissions to do so.
Updated by Jens Harbott almost 7 years ago
So when fixing http://tracker.ceph.com/issues/11256 commit edc0627a1dbeb66ea2f5f177f6ceca64559ff3d8 explictly added setting the new tag in an effort to avoid race conditions. Seems a better approach is needed, one idea would be updating all refcounts when changing the tag, but that might be pretty expensive.
Updated by Yehuda Sadeh almost 7 years ago
I think the way to go would be to add a new attr that would not change, and hold the 'tail tag'. We can either always set it, or only set it when we modify the original tag (e.g., when doing setacl, or other operations). Then when feeding the gc, if tail tag exists, use tail tag, otherwise use the other tag.
Updated by Jens Harbott almost 7 years ago
It seems that this issue could be exploited to DoS a cluster, a client can circumvent their quota and fill up all space with deleted-but-not-completely-deleted fragments, so maybe the priority should be raised a bit.
Updated by Yehuda Sadeh almost 7 years ago
Please see this PR:
https://github.com/ceph/ceph/pull/16145
Still needs more testing.
Updated by Matt Benjamin almost 7 years ago
- Status changed from New to In Progress
- Assignee set to Yehuda Sadeh
Updated by Yehuda Sadeh over 6 years ago
- Backport set to lumious, kraken, jewel
Updated by Nathan Cutler over 6 years ago
- Backport changed from lumious, kraken, jewel to luminous, kraken, jewel
Updated by Orit Wasserman over 6 years ago
- Status changed from In Progress to Pending Backport
Updated by Nathan Cutler over 6 years ago
- Backport changed from luminous, kraken, jewel to luminous, jewel
Updated by Nathan Cutler over 6 years ago
- Copied to Backport #21270: luminous: rgw: shadow objects are sometimes not removed added
Updated by Nathan Cutler over 6 years ago
- Copied to Backport #21271: jewel: rgw: shadow objects are sometimes not removed added
Updated by Aleksei Gutikov over 6 years ago
bug reproducible for v12.2.1
s3cmd mb s3://1111
s3cmd put test5M.txt s3://1111/xxx
s3cmd cp s3://1111/xxx s3://1111/yyy
s3cmd rm s3://1111/xxx
s3cmd rm s3://1111/yyy
After gc 100% finished processing:
$ rados ls -p default.rgw.buckets.data
104be53b-17a5-4d86-92db-6e9eb686cebb.4173.1__shadow_.ekXz6LeTUzJ5gZWc_EWTOCSHz9qciR7_1
$ rados -p default.rgw.buckets.data getxattr 104be53b-17a5-4d86-92db-6e9eb686cebb.4173.1__shadow_.ekXz6LeTUzJ5gZWc_EWTOCSHz9qciR7_1 refcount | od -x -a
0000000 0102 0068 0000 0001 0000 002b 0000 3031
stx soh h nul nul nul soh nul nul nul + nul nul nul 1 0
0000020 6234 3565 6233 312d 6137 2d35 6434 3638
4 b e 5 3 b - 1 7 a 5 - 4 d 8 6
0000040 392d 6432 2d62 6536 6539 3662 3638 6563
- 9 2 d b - 6 e 9 e b 6 8 6 c e
0000060 6262 342e 3731 2e33 0137 0001 0000 002c
b b . 4 1 7 3 . 7 soh soh nul nul nul , nul
0000100 0000 3031 6234 3565 6233 312d 6137 2d35
nul nul 1 0 4 b e 5 3 b - 1 7 a 5 -
0000120 6434 3638 392d 6432 2d62 6536 6539 3662
4 d 8 6 - 9 2 d b - 6 e 9 e b 6
0000140 3638 6563 6262 342e 3731 2e33 0037
8 6 c e b b . 4 1 7 3 . 7 nul
0000156
Updated by Nathan Cutler over 5 years ago
- Status changed from Pending Backport to Resolved