Project

General

Profile

Bug #20234

rgw: shadow objects are sometimes not removed

Added by Jens Harbott about 2 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
06/09/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous, jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

Testing on master, after copying an object, then deleting the copy and finally deleting the original object, the associated shadow objects to not get deleted because there is a mismatch in the tags involved. Here I have made a small patch to add the info.tag to the log message:

out/rgw.0.log:2017-06-09 12:01:44.515851 7f171e7b2700  0 gc::process: removing default.rgw.buckets.data:3c50c7ad-d593-4c0a-b72b-922bc3c3aac3.4116.1__shadow_.TFa_fTQbzhmy_wEDOahQFDLPjlQwTUJ_1 tag: _4HrVnrN4fE7XtDh2Wc8mxc8-t7RKAeh
out/rgw.0.log:2017-06-09 12:05:44.998398 7f171e7b2700  0 gc::process: removing default.rgw.buckets.data:3c50c7ad-d593-4c0a-b72b-922bc3c3aac3.4116.1__shadow_.TFa_fTQbzhmy_wEDOahQFDLPjlQwTUJ_1 tag: 3c50c7ad-d593-4c0a-b72b-922bc3c3aac3.4116.5

While the refcount xattr shows:

$ bin/rados -p default.rgw.buckets.data getxattr 3c50c7ad-d593-4c0a-b72b-922bc3c3aac3.4116.1__shadow_.TFa_fTQbzhmy_wEDOahQFDLPjlQwTUJ_1 refcount|od -x -a
0000000    0101    0034    0000    0001    0000    002b    0000    6333
        soh soh   4 nul nul nul soh nul nul nul   + nul nul nul   3   c
0000020    3035    3763    6461    642d    3935    2d33    6334    6130
          5   0   c   7   a   d   -   d   5   9   3   -   4   c   0   a
0000040    622d    3237    2d62    3239    6232    3363    3363    6161
          -   b   7   2   b   -   9   2   2   b   c   3   c   3   a   a
0000060    3363    342e    3131    2e36    0137
          c   3   .   4   1   1   6   .   7 soh
0000072

So 3c50c7ad-d593-4c0a-b72b-922bc3c3aac3.4116.7 is the tag that is written into the refcount for the copied object, but the tag stored with the object and being passed to the refcount_put is this random string instead: _4HrVnrN4fE7XtDh2Wc8mxc8-t7RKAeh


Related issues

Copied to rgw - Backport #21270: luminous: rgw: shadow objects are sometimes not removed Resolved
Copied to rgw - Backport #21271: jewel: rgw: shadow objects are sometimes not removed Rejected

History

#1 Updated by Jens Harbott about 2 years ago

Verified that this also affects Jewel. But before committing a fix, better make sure that http://tracker.ceph.com/issues/20107 is fixed, too. Getting clogged up with undeleted object pieces is still better than having data loss.

#2 Updated by Jens Harbott about 2 years ago

Some more data:

This only seems to affect the S3 API, objects copied via the Swift API do not seem to be affected.

The issue also seems to be present in current Hammer.

If you want to reproduce it, lower the gc timers so you only need to wait two minutes instead of two hours:

rgw gc obj min wait = 120
rgw gc processor period = 120

#3 Updated by Jens Harbott about 2 years ago

More debugging shows that the pure copy operation produces the correct object, it is the put_acl call, which s3cmd does afterwards in order to preserve the ACL, which destroys the tag. Running this python code allows to show both cases:

import boto
access_key = 'string'
secret_key = 'string'

conn = boto.connect_s3(
        aws_access_key_id = access_key,
        aws_secret_access_key = secret_key,
        host = 'localhost', port=8000,
        is_secure=False
        )

bucket = conn.create_bucket('bla')

key = bucket.new_key('test1')
key.set_contents_from_string('Hello World!\n'*400000)

key2 = bucket.copy_key('test02', 'bla', 'test1')
key3 = bucket.copy_key('test03', 'bla', 'test1', preserve_acl=True)

Object test02 afterwards still has the correct tag mentioned in the refcount, while the tag for object test03 gets overwritten with a random string.

#4 Updated by Jens Harbott about 2 years ago

The cat to set_attrs still contains the correct parameters, but what gets written to the OSD contains the "bad" idtag in addition:

2017-06-16 06:17:45.429337 7fb1d7fff700 15 Starting set_attrs {user.rgw.acl=buffer::list(len=149,
        buffer::ptr(0~149 0x7fb1c4031000 in raw 0x7fb1c4031000 len 4096 nref 3)
),user.rgw.content_type=buffer::list(len=25,
        buffer::ptr(202~25 0x7fb1dc01104a in raw 0x7fb1dc010f80 len 1116 nref 18)
),user.rgw.etag=buffer::list(len=33,
        buffer::ptr(248~33 0x7fb1dc011078 in raw 0x7fb1dc010f80 len 1116 nref 18)
),user.rgw.idtag=buffer::list(len=16,
        buffer::ptr(303~16 0x7fb1dc0110af in raw 0x7fb1dc010f80 len 1116 nref 18)
),user.rgw.manifest=buffer::list(len=378,
        buffer::ptr(344~378 0x7fb1dc0110d8 in raw 0x7fb1dc010f80 len 1116 nref 18)
),user.rgw.x-amz-content-sha256=buffer::list(len=65,
        buffer::ptr(759~65 0x7fb1dc011277 in raw 0x7fb1dc010f80 len 1116 nref 18)
),user.rgw.x-amz-date=buffer::list(len=32,
        buffer::ptr(851~32 0x7fb1dc0112d3 in raw 0x7fb1dc010f80 len 1116 nref 18)
),user.rgw.x-amz-meta-s3cmd-attrs=buffer::list(len=133,
        buffer::ptr(922~133 0x7fb1dc01131a in raw 0x7fb1dc010f80 len 1116 nref 18)
),user.rgw.x-amz-storage-class=buffer::list(len=9,
        buffer::ptr(1091~9 0x7fb1dc0113c3 in raw 0x7fb1dc010f80 len 1116 nref 18)
)}

Using this patched code to generate the output:

diff --git a/src/rgw/rgw_op.cc b/src/rgw/rgw_op.cc
index de580e6..734d085 100644
--- a/src/rgw/rgw_op.cc
+++ b/src/rgw/rgw_op.cc
@@ -2624,6 +2624,7 @@ void RGWPutACLs::execute()
   attrs[RGW_ATTR_ACL] = bl;

   if (!s->object.empty()) {
+    ldout(s->cct, 15) << "Starting set_attrs " << attrs << dendl;
     ret = store->set_attrs(s->obj_ctx, obj, attrs, NULL, ptracker);
   } else {
     ret = rgw_bucket_set_attrs(store, s->bucket_info, attrs, NULL, ptracker);

len=16 matches the correct original idtag, while len=33 is the random one. This is the osd_op that is generated, wrapped and tagged for clarity:

2017-06-16 06:17:45.430857 7fb1d7fff700  1 -- 10.70.11.23:0/699179395 --> 10.70.11.23:6800/26370 -- osd_op(client.14121.0:148 default.14103.5_copy2 [
cmpxattr user.rgw.idtag (16) op 1 mode 1,
setxattr user.rgw.acl (149),setxattr user.rgw.content_type (25),setxattr user.rgw.etag (33),
setxattr user.rgw.idtag (16), <== correct tag
setxattr user.rgw.manifest (378),setxattr user.rgw.x-amz-content-sha256 (65),setxattr user.rgw.x-amz-date (32),setxattr user.rgw.x-amz-meta-s3cmd-attrs (133),setxattr user.rgw.x-amz-storage-class (9),
setxattr user.rgw.idtag (33)  <== overwritten with random tag here
] 9.bc2509ae ondisk+write+known_if_redirected e32) v5 -- ?+0 0x7fb1c4038010 con 0x2d9c7b0

I'd also like to assign myself to this bug, but don't seem to have permissions to do so.

#5 Updated by Brad Hubbard about 2 years ago

  • Assignee set to Jens Harbott

#6 Updated by Jens Harbott about 2 years ago

So when fixing http://tracker.ceph.com/issues/11256 commit edc0627a1dbeb66ea2f5f177f6ceca64559ff3d8 explictly added setting the new tag in an effort to avoid race conditions. Seems a better approach is needed, one idea would be updating all refcounts when changing the tag, but that might be pretty expensive.

#7 Updated by Yehuda Sadeh about 2 years ago

I think the way to go would be to add a new attr that would not change, and hold the 'tail tag'. We can either always set it, or only set it when we modify the original tag (e.g., when doing setacl, or other operations). Then when feeding the gc, if tail tag exists, use tail tag, otherwise use the other tag.

#8 Updated by Brad Hubbard almost 2 years ago

  • Assignee deleted (Jens Harbott)

#9 Updated by Jens Harbott almost 2 years ago

It seems that this issue could be exploited to DoS a cluster, a client can circumvent their quota and fill up all space with deleted-but-not-completely-deleted fragments, so maybe the priority should be raised a bit.

#10 Updated by Yehuda Sadeh almost 2 years ago

Please see this PR:
https://github.com/ceph/ceph/pull/16145

Still needs more testing.

#11 Updated by Matt Benjamin almost 2 years ago

  • Status changed from New to In Progress
  • Assignee set to Yehuda Sadeh

#12 Updated by Yehuda Sadeh almost 2 years ago

  • Backport set to lumious, kraken, jewel

#13 Updated by Nathan Cutler almost 2 years ago

  • Backport changed from lumious, kraken, jewel to luminous, kraken, jewel

#14 Updated by Orit Wasserman almost 2 years ago

  • Status changed from In Progress to Pending Backport

#15 Updated by Nathan Cutler almost 2 years ago

  • Backport changed from luminous, kraken, jewel to luminous, jewel

#16 Updated by Nathan Cutler almost 2 years ago

  • Copied to Backport #21270: luminous: rgw: shadow objects are sometimes not removed added

#17 Updated by Nathan Cutler almost 2 years ago

  • Copied to Backport #21271: jewel: rgw: shadow objects are sometimes not removed added

#18 Updated by Aleksei Gutikov over 1 year ago

bug reproducible for v12.2.1

s3cmd mb s3://1111
s3cmd put test5M.txt s3://1111/xxx
s3cmd cp s3://1111/xxx s3://1111/yyy
s3cmd rm s3://1111/xxx
s3cmd rm s3://1111/yyy

After gc 100% finished processing:

$ rados ls -p default.rgw.buckets.data
104be53b-17a5-4d86-92db-6e9eb686cebb.4173.1__shadow_.ekXz6LeTUzJ5gZWc_EWTOCSHz9qciR7_1
$ rados -p default.rgw.buckets.data getxattr 104be53b-17a5-4d86-92db-6e9eb686cebb.4173.1__shadow_.ekXz6LeTUzJ5gZWc_EWTOCSHz9qciR7_1 refcount | od -x -a
0000000    0102    0068    0000    0001    0000    002b    0000    3031
        stx soh   h nul nul nul soh nul nul nul   + nul nul nul   1   0
0000020    6234    3565    6233    312d    6137    2d35    6434    3638
          4   b   e   5   3   b   -   1   7   a   5   -   4   d   8   6
0000040    392d    6432    2d62    6536    6539    3662    3638    6563
          -   9   2   d   b   -   6   e   9   e   b   6   8   6   c   e
0000060    6262    342e    3731    2e33    0137    0001    0000    002c
          b   b   .   4   1   7   3   .   7 soh soh nul nul nul   , nul
0000100    0000    3031    6234    3565    6233    312d    6137    2d35
        nul nul   1   0   4   b   e   5   3   b   -   1   7   a   5   -
0000120    6434    3638    392d    6432    2d62    6536    6539    3662
          4   d   8   6   -   9   2   d   b   -   6   e   9   e   b   6
0000140    3638    6563    6262    342e    3731    2e33    0037
          8   6   c   e   b   b   .   4   1   7   3   .   7 nul
0000156

#19 Updated by Aleksei Gutikov over 1 year ago

Fixed in 12.2.2

#20 Updated by Nathan Cutler 8 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF