Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2019-09-09T18:42:42ZCeph
Redmine rgw - Bug #41729 (Triaged): rgw: sync log trimming does not work on buckets associated with a tenanthttps://tracker.ceph.com/issues/417292019-09-09T18:42:42ZEd Fishered@debacle.org
<p>I know there's been a major refactor in how bucket metadata is fetched in master, and I haven't confirmed this behavior with the new code. I have confirmed it affects 14.2.2 and 14.2.3.</p>
<p>rgw_sync_log_trim calls RGWGetBucketInstanceInfoCR, passing the bucket_instance as a string. However, the bucket_instance string separates the tenant and bucket with / instead of :, and this isn't converted before being sent to the objecter. This causes errors fetching the bucket metadata, preventing trimming from working as expected.</p>
<p>I was able to fix the metadata fetching in RGWAsyncGetBucketInstanceInfo with this patch, but I'm not sure if it's the best solution to the problem:</p>
<pre><code class="diff syntaxhl"><span class="CodeRay"><span class="line comment">diff --git a/src/rgw/rgw_cr_rados.cc b/src/rgw/rgw_cr_rados.cc</span>
<span class="line comment">index 7284c10dc4..1be5790724 100644</span>
<span class="line head"><span class="head">--- </span><span class="filename">a/src/rgw/rgw_cr_rados.cc</span></span>
<span class="line head"><span class="head">+++ </span><span class="filename">b/src/rgw/rgw_cr_rados.cc</span></span>
<span class="line change"><span class="change">@@</span> -4,6 +4,7 <span class="change">@@</span></span>
<span class="preprocessor">#include</span> <span class="include">"include/compat.h"</span>
<span class="preprocessor">#include</span> <span class="include">"rgw_rados.h"</span>
<span class="preprocessor">#include</span> <span class="include">"rgw_zone.h"</span>
<span class="line insert"><span class="insert">+</span><span class="preprocessor">#include</span> <span class="include">"rgw_bucket.h"</span> </span>
<span class="preprocessor">#include</span> <span class="include">"rgw_coroutine.h"</span>
<span class="preprocessor">#include</span> <span class="include">"rgw_cr_rados.h"</span>
<span class="preprocessor">#include</span> <span class="include">"rgw_sync_counters.h"</span>
<span class="change"><span class="change">@@</span> -529,6 +530,7 <span class="change">@@</span></span> <span class="predefined-type">bool</span> RGWOmapAppend::finish() {
<span class="predefined-type">int</span> RGWAsyncGetBucketInstanceInfo::_send_request()
{
RGWSysObjectCtx obj_ctx = store->svc.sysobj->init_obj_ctx();
<span class="line insert"><span class="insert">+</span> rgw_bucket_instance_key_to_oid(oid);</span>
<span class="predefined-type">int</span> r = store->get_bucket_instance_from_oid(obj_ctx, oid, bucket_info, <span class="predefined-constant">NULL</span>, <span class="predefined-constant">NULL</span>);
<span class="keyword">if</span> (r < <span class="integer">0</span>) {
ldout(store->ctx(), <span class="integer">0</span>) << <span class="string"><span class="delimiter">"</span><span class="content">ERROR: failed to get bucket instance info for </span><span class="delimiter">"</span></span>
<span class="line comment">diff --git a/src/rgw/rgw_cr_rados.h b/src/rgw/rgw_cr_rados.h</span>
<span class="line comment">index e919217cba..b334259689 100644</span>
<span class="line head"><span class="head">--- </span><span class="filename">a/src/rgw/rgw_cr_rados.h</span></span>
<span class="line head"><span class="head">+++ </span><span class="filename">b/src/rgw/rgw_cr_rados.h</span></span>
<span class="change"><span class="change">@@</span> -776,7 +776,7 <span class="change">@@</span></span> <span class="label">public:</span>
class RGWAsyncGetBucketInstanceInfo : public RGWAsyncRadosRequest {
RGWRados *store;
<span class="line delete"><span class="delete">-</span> <span class="eyecatcher"><span class="directive">const</span> </span>std::string oid;</span>
<span class="line insert"><span class="insert">+</span> std::string oid;</span>
<span class="label">protected:</span>
<span class="predefined-type">int</span> _send_request() override;
</span></code></pre>
<p>Here's some sample logging showing the objecter is requesting the wrong oid:</p>
<pre>
2019-09-09 16:47:00.580 7f5cb5ffb700 20 client.206602668.objecter put_session s=0x55bf820eccf0 osd=679 5
2019-09-09 16:47:00.580 7f5cb5ffb700 5 client.206602668.objecter 2 in flight
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter ms_dispatch 0x55bf820c85e0 osd_op_reply(49 .bucket.meta.tenantname/320-53:9aab95bd-768a-4e01-b099-f9d5bf84447c.229627369.62 [call,getxattrs,stat] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter in handle_osd_op_reply
2019-09-09 16:47:00.580 7f5ce4dcd700 7 client.206602668.objecter handle_osd_op_reply 49 ondisk uv 0 in 21.7 attempt 0
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 cr:s=0x55bf8214b1a0:op=0x55bf8214a940:26RGWGetBucketInstanceInfoCR: operate()
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 cr:s=0x55bf8214b1a0:op=0x55bf8214a940:26RGWGetBucketInstanceInfoCR: operate() returned r=-2
2019-09-09 16:47:00.580 7f5cf9cb37c0 15 stack 0x55bf8214b1a0 end
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 stack->operate() returned ret=-2
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 run: stack=0x55bf8214b1a0 is done
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter op 0 rval 0 len 0
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 cr:s=0x55bf820c3a40:op=0x55bf8234ca30:20BucketTrimInstanceCR: operate()
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter op 0 handler 0x7f5c880027f0
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 collect(): s=0x55bf820c3a40 stack=0x55bf8214a7c0 is still running
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 collect(): s=0x55bf820c3a40 stack=0x55bf8214b1a0 encountered error (r=-2), skipping next stacks
2019-09-09 16:47:00.580 7f5cf9cb37c0 20 run: stack=0x55bf820c3a40 is_blocked_by_stack()=0 is_sleeping=0 waiting_for_child()=1
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter op 1 rval 0 len 0
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter op 1 handler 0x7f5c88003bd0
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter op 2 rval 0 len 0
2019-09-09 16:47:00.580 7f5ce4dcd700 10 client.206602668.objecter op 2 handler 0x7f5c88004060
2019-09-09 16:47:00.580 7f5ce4dcd700 15 client.206602668.objecter handle_osd_op_reply completed tid 49
2019-09-09 16:47:00.580 7f5ce4dcd700 15 client.206602668.objecter _finish_op 49
2019-09-09 16:47:00.580 7f5ce4dcd700 20 client.206602668.objecter put_session s=0x55bf820eccf0 osd=679 5
2019-09-09 16:47:00.580 7f5ce4dcd700 15 client.206602668.objecter _session_op_remove 679 49
2019-09-09 16:47:00.580 7f5ce4dcd700 5 client.206602668.objecter 1 in flight
2019-09-09 16:47:00.580 7f5cb67fc700 10 librados: Objecter returned from call r=-2
2019-09-09 16:47:00.580 7f5cb67fc700 0 ERROR: failed to get bucket instance info for .bucket.meta.tenantname/320-53:9aab95bd-768a-4e01-b099-f9d5bf84447c.229627369.62</pre> rgw - Fix #41540 (Fix Under Review): Multiple issues in the RGW AWS cloud sync modulehttps://tracker.ceph.com/issues/415402019-08-27T17:25:59ZEd Fishered@debacle.org
<ul>
<li>The default config profile is parsed before the acls/connections blocks, preventing the default config from using connection_id and potentially causing conflicts with explicit profiles</li>
<li>Tracking of created_buckets doesn't work as intended</li>
<li>Canonical resource generation for virtual-style endpoints incorrectly doesn't include the bucket name</li>
<li>Multipart upload min chunk size is calculated incorrectly, resulting in 10001 parts instead of 10000.</li>
<li>Tracking cur_ofs separately with multipart uploads can result in cur_ofs getting out of sync with cur_part, causing underflows when content-length is calculated and resulting in EntityTooLarge errors from S3.</li>
</ul>
<p>And an enhancement thrown in:</p>
<ul>
<li>Allow explicit profile matching based on bucket owner/bucket tenant, in addition to bucket name.</li>
</ul>
<p>PR coming. This is my first contribution, so please let me know if I screwed up any of the steps, more information is needed, or I should split any of these into separate issues/PRs.</p> rgw - Bug #41011 (New): utf8 incompatibility in metadata added by rgw cloud sync modulehttps://tracker.ceph.com/issues/410112019-07-30T15:44:15ZEd Fishered@debacle.org
<p>Hi there,</p>
<p>The RGW cloud sync module seems to add the source key name as metadata when storing objects at the destination zone, by adding an "x-amz-meta-rgwx-source-key" header. However, object names can have characters in them that are illegal to use in S3 metadata.</p>
<p>Per <a class="external" href="https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-metadata">https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-metadata</a> all metadata submitted via the REST api must be ascii. I've tested this and S3 and another S3-compatible storage provider both throw SignatureDoesNotMatch errors. Ceph seems to allow utf-8 in metadata, so a rgw->rgw test likely wouldn't show this failure.</p>
<p>This likely affects every object with utf-8 characters in its key name. It may also break syncing for objects that were stored with utf-8 metadata, since rgw accepts it but S3 will not. I'm not sure of the best solution -- maybe just using url_encode on x-amz-meta-rgw-source-key and any attrs kept with keep_attr?</p>
<p>Please let me know if you need any more details. The simplest test case would be to create a single bucket with a single object with a utf-8 character in the key name and try to sync. The logging makes it hard to track down the issue otherwise.</p> rgw - Bug #38214 (Resolved): unable to cancel reshard operations for buckets with tenantshttps://tracker.ceph.com/issues/382142019-02-06T18:01:16ZEd Fishered@debacle.org
<p><a class="external" href="http://tracker.ceph.com/issues/22046">http://tracker.ceph.com/issues/22046</a> made it so you could successfully submit a reshard operation for a tenant's bucket, but they can't be cancelled. Per <a class="external" href="https://github.com/ceph/ceph/pull/18811#issuecomment-378517787">https://github.com/ceph/ceph/pull/18811#issuecomment-378517787</a> --</p>
<pre>
root@osdnode03:~# radosgw-admin reshard cancel --uid='DB0220$elasticsearch' --tenant=DB0220 --bucket=backups
Error in getting bucket backups: (2) No such file or directory
2018-04-04 10:07:03.905049 7f105ee0fcc0 -1 ERROR: failed to get entry from reshard log, oid=reshard.0000000010 tenant= bucket=backups
</pre>
<p>I can verify that removing the comment on this line: <a class="external" href="https://github.com/ceph/ceph/blob/de98f2e0d9783791436755246a3a12ce94ef088d/src/rgw/rgw_admin.cc#L6349">https://github.com/ceph/ceph/blob/de98f2e0d9783791436755246a3a12ce94ef088d/src/rgw/rgw_admin.cc#L6349</a> and rebuilding radosgw-admin allows the reshard operation to be successfully cancelled, but I haven't tested the change to see if it causes issues on buckets without tenants.</p> RADOS - Bug #37968 (Resolved): maybe_remove_pg_upmaps incorrectly cancels valid pending upmapshttps://tracker.ceph.com/issues/379682019-01-18T22:43:29ZEd Fishered@debacle.org
<p>It appears that OSDMap::maybe_remove_pg_upmaps's sanity checks are overzealous. With some crush rules it is possible for osdmaptool to generate valid upmaps, but maybe_remove_pg_upmaps will cancel them.</p>
<p>It looks like it relies on get_rule_failure_domain and rejects any upmap that results in two osds sharing a parent of that type. However, with a custom crush rule like "choose indep 2 type host, choose indep 2 type osd" such an upmap would be valid. Is it possible to use CrushWrapper::try_remap_rule or something similar to more thoroughly validate the upmap?</p>
<p>To reproduce:</p>
<ol>
<li>ceph osd erasure-code-profile set upmaptest plugin=jerasure k=2 m=2 crush-device-class=hdd crush-failure-domain=osd</li>
<li>create a crush rule for the pool:<br /><pre><code class="json syntaxhl"><span class="CodeRay">{
<span class="key"><span class="delimiter">"</span><span class="content">rule_id</span><span class="delimiter">"</span></span>: <span class="integer">2</span>,
<span class="key"><span class="delimiter">"</span><span class="content">rule_name</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">upmaptest</span><span class="delimiter">"</span></span>,
<span class="key"><span class="delimiter">"</span><span class="content">ruleset</span><span class="delimiter">"</span></span>: <span class="integer">2</span>,
<span class="key"><span class="delimiter">"</span><span class="content">type</span><span class="delimiter">"</span></span>: <span class="integer">3</span>,
<span class="key"><span class="delimiter">"</span><span class="content">min_size</span><span class="delimiter">"</span></span>: <span class="integer">3</span>,
<span class="key"><span class="delimiter">"</span><span class="content">max_size</span><span class="delimiter">"</span></span>: <span class="integer">4</span>,
<span class="key"><span class="delimiter">"</span><span class="content">steps</span><span class="delimiter">"</span></span>: [
{
<span class="key"><span class="delimiter">"</span><span class="content">op</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">set_chooseleaf_tries</span><span class="delimiter">"</span></span>,
<span class="key"><span class="delimiter">"</span><span class="content">num</span><span class="delimiter">"</span></span>: <span class="integer">5</span>
},
{
<span class="key"><span class="delimiter">"</span><span class="content">op</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">set_choose_tries</span><span class="delimiter">"</span></span>,
<span class="key"><span class="delimiter">"</span><span class="content">num</span><span class="delimiter">"</span></span>: <span class="integer">100</span>
},
{
<span class="key"><span class="delimiter">"</span><span class="content">op</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">take</span><span class="delimiter">"</span></span>,
<span class="key"><span class="delimiter">"</span><span class="content">item</span><span class="delimiter">"</span></span>: <span class="integer">-1</span>,
<span class="key"><span class="delimiter">"</span><span class="content">item_name</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">default</span><span class="delimiter">"</span></span>
},
{
<span class="key"><span class="delimiter">"</span><span class="content">op</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">choose_indep</span><span class="delimiter">"</span></span>,
<span class="key"><span class="delimiter">"</span><span class="content">num</span><span class="delimiter">"</span></span>: <span class="integer">2</span>,
<span class="key"><span class="delimiter">"</span><span class="content">type</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">host</span><span class="delimiter">"</span></span>
},
{
<span class="key"><span class="delimiter">"</span><span class="content">op</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">choose_indep</span><span class="delimiter">"</span></span>,
<span class="key"><span class="delimiter">"</span><span class="content">num</span><span class="delimiter">"</span></span>: <span class="integer">2</span>,
<span class="key"><span class="delimiter">"</span><span class="content">type</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">osd</span><span class="delimiter">"</span></span>
},
{
<span class="key"><span class="delimiter">"</span><span class="content">op</span><span class="delimiter">"</span></span>: <span class="string"><span class="delimiter">"</span><span class="content">emit</span><span class="delimiter">"</span></span>
}
]
} ]
}
</span></code></pre></li>
<li>ceph osd pool create upmaptest 8 8 erasure upmaptest</li>
<li>Submit an upmap where the source+target osd are on the same host: ceph osd pg-upmap-items 2.7 1 2</li>
</ol>
<p>The mon's debug log will show "2019-01-18 19:16:32.044 7fdd4d0a2700 10 maybe_remove_pg_upmaps cancel invalid pending pg_upmap_items entry 2.7->[1,2]"</p>
<p>This is an edge case since it depends on using a custom crush rule, but it almost completely breaks the upmap functionality for affected pools.</p>