Project

General

Profile

Bug #41729

rgw: sync log trimming does not work on buckets associated with a tenant

Added by Ed Fisher about 1 month ago. Updated 28 days ago.

Status:
Triaged
Priority:
Normal
Assignee:
Target version:
-
Start date:
09/09/2019
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

I know there's been a major refactor in how bucket metadata is fetched in master, and I haven't confirmed this behavior with the new code. I have confirmed it affects 14.2.2 and 14.2.3.

rgw_sync_log_trim calls RGWGetBucketInstanceInfoCR, passing the bucket_instance as a string. However, the bucket_instance string separates the tenant and bucket with / instead of :, and this isn't converted before being sent to the objecter. This causes errors fetching the bucket metadata, preventing trimming from working as expected.

I was able to fix the metadata fetching in RGWAsyncGetBucketInstanceInfo with this patch, but I'm not sure if it's the best solution to the problem:

diff --git a/src/rgw/rgw_cr_rados.cc b/src/rgw/rgw_cr_rados.cc
index 7284c10dc4..1be5790724 100644
--- a/src/rgw/rgw_cr_rados.cc
+++ b/src/rgw/rgw_cr_rados.cc
@@ -4,6 +4,7 @@
 #include "include/compat.h" 
 #include "rgw_rados.h" 
 #include "rgw_zone.h" 
+#include "rgw_bucket.h" 
 #include "rgw_coroutine.h" 
 #include "rgw_cr_rados.h" 
 #include "rgw_sync_counters.h" 
@@ -529,6 +530,7 @@ bool RGWOmapAppend::finish() {
 int RGWAsyncGetBucketInstanceInfo::_send_request()
 {
   RGWSysObjectCtx obj_ctx = store->svc.sysobj->init_obj_ctx();
+  rgw_bucket_instance_key_to_oid(oid);
   int r = store->get_bucket_instance_from_oid(obj_ctx, oid, bucket_info, NULL, NULL);
   if (r < 0) {
     ldout(store->ctx(), 0) << "ERROR: failed to get bucket instance info for " 
diff --git a/src/rgw/rgw_cr_rados.h b/src/rgw/rgw_cr_rados.h
index e919217cba..b334259689 100644
--- a/src/rgw/rgw_cr_rados.h
+++ b/src/rgw/rgw_cr_rados.h
@@ -776,7 +776,7 @@ public:

 class RGWAsyncGetBucketInstanceInfo : public RGWAsyncRadosRequest {
   RGWRados *store;
-  const std::string oid;
+  std::string oid;

 protected:
   int _send_request() override;

Here's some sample logging showing the objecter is requesting the wrong oid:

2019-09-09 16:47:00.580 7f5cb5ffb700 20 client.206602668.objecter put_session s=0x55bf820eccf0 osd=679 5
2019-09-09 16:47:00.580 7f5cb5ffb700  5 client.206602668.objecter 2 in flight
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter ms_dispatch 0x55bf820c85e0 osd_op_reply(49 .bucket.meta.tenantname/320-53:9aab95bd-768a-4e01-b099-f9d5bf84447c.229627369.62 [call,getxattrs,stat] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter in handle_osd_op_reply
2019-09-09 16:47:00.580 7f5ce4dcd700  7 client.206602668.objecter handle_osd_op_reply 49 ondisk uv 0 in 21.7 attempt 0
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 cr:s=0x55bf8214b1a0:op=0x55bf8214a940:26RGWGetBucketInstanceInfoCR: operate()
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 cr:s=0x55bf8214b1a0:op=0x55bf8214a940:26RGWGetBucketInstanceInfoCR: operate() returned r=-2
2019-09-09 16:47:00.580 7f5cf9cb37c0 15 stack 0x55bf8214b1a0 end
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 stack->operate() returned ret=-2
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 run: stack=0x55bf8214b1a0 is done
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter  op 0 rval 0 len 0
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 cr:s=0x55bf820c3a40:op=0x55bf8234ca30:20BucketTrimInstanceCR: operate()
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter  op 0 handler 0x7f5c880027f0
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 collect(): s=0x55bf820c3a40 stack=0x55bf8214a7c0 is still running
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 collect(): s=0x55bf820c3a40 stack=0x55bf8214b1a0 encountered error (r=-2), skipping next stacks
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 run: stack=0x55bf820c3a40 is_blocked_by_stack()=0 is_sleeping=0 waiting_for_child()=1
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter  op 1 rval 0 len 0
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter  op 1 handler 0x7f5c88003bd0
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter  op 2 rval 0 len 0
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter  op 2 handler 0x7f5c88004060
2019-09-09 16:47:00.580 7f5ce4dcd700 15 client.206602668.objecter handle_osd_op_reply completed tid 49
2019-09-09 16:47:00.580 7f5ce4dcd700 15 client.206602668.objecter _finish_op 49
2019-09-09 16:47:00.580 7f5ce4dcd700 20 client.206602668.objecter put_session s=0x55bf820eccf0 osd=679 5
2019-09-09 16:47:00.580 7f5ce4dcd700 15 client.206602668.objecter _session_op_remove 679 49
2019-09-09 16:47:00.580 7f5ce4dcd700  5 client.206602668.objecter 1 in flight
2019-09-09 16:47:00.580 7f5cb67fc700 10 librados: Objecter returned from call r=-2
2019-09-09 16:47:00.580 7f5cb67fc700  0 ERROR: failed to get bucket instance info for .bucket.meta.tenantname/320-53:9aab95bd-768a-4e01-b099-f9d5bf84447c.229627369.62

History

#1 Updated by Greg Farnum about 1 month ago

  • Project changed from Ceph to rgw

#2 Updated by Casey Bodley 28 days ago

  • Status changed from New to Triaged
  • Assignee set to Casey Bodley

Also available in: Atom PDF