https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2018-02-07T18:41:33ZCeph rgw - Bug #22928: RGW will not list contents of Jewel-era buckets: reshard does NOT fixhttps://tracker.ceph.com/issues/22928?journal_id=1070772018-02-07T18:41:33ZYehuda Sadehyehuda@redhat.com
<ul></ul><p>The problem is that we fail to go to the explicit placement pool for index operations even if one is set. One workaround would be to copy the objects from the old index pool to the new index pool. Another workaround would be to create a new placement target in the zone that reflects the old placement, and modify the bucket info placement_rule to point at that.</p> rgw - Bug #22928: RGW will not list contents of Jewel-era buckets: reshard does NOT fixhttps://tracker.ceph.com/issues/22928?journal_id=1070812018-02-07T21:21:59ZYehuda Sadehyehuda@redhat.com
<ul></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/20352">https://github.com/ceph/ceph/pull/20352</a></p> rgw - Bug #22928: RGW will not list contents of Jewel-era buckets: reshard does NOT fixhttps://tracker.ceph.com/issues/22928?journal_id=1074612018-02-15T18:44:34ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>12</i></li></ul> rgw - Bug #22928: RGW will not list contents of Jewel-era buckets: reshard does NOT fixhttps://tracker.ceph.com/issues/22928?journal_id=1078812018-02-22T19:05:58ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>7</i></li></ul> rgw - Bug #22928: RGW will not list contents of Jewel-era buckets: reshard does NOT fixhttps://tracker.ceph.com/issues/22928?journal_id=1079332018-02-23T14:51:27ZAbhishek Lekshmananabhishek.lekshmanan@gmail.com
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-5 priority-high3 closed" href="/issues/23106">Backport #23106</a>: luminous: RGW will not list contents of Jewel-era buckets: reshard does NOT fix</i> added</li></ul> rgw - Bug #22928: RGW will not list contents of Jewel-era buckets: reshard does NOT fixhttps://tracker.ceph.com/issues/22928?journal_id=1081792018-02-27T17:48:34ZYuri Weinsteinyweinste@redhat.com
<ul></ul><p>Yehuda Sadeh wrote:</p>
<blockquote>
<p><a class="external" href="https://github.com/ceph/ceph/pull/20352">https://github.com/ceph/ceph/pull/20352</a></p>
</blockquote>
<p>merged</p> rgw - Bug #22928: RGW will not list contents of Jewel-era buckets: reshard does NOT fixhttps://tracker.ceph.com/issues/22928?journal_id=1083482018-03-01T18:50:58ZYehuda Sadehyehuda@redhat.com
<ul><li><strong>Status</strong> changed from <i>7</i> to <i>Resolved</i></li></ul> rgw - Bug #22928: RGW will not list contents of Jewel-era buckets: reshard does NOT fixhttps://tracker.ceph.com/issues/22928?journal_id=1770102020-10-09T09:38:04ZWido den Hollanderwido@42on.com
<ul></ul><p>While upgrading a cluster from Mimic to Nautilus I experienced this as well. This cluster was installed 7 years ago with what I think was Argonaut and has been upgraded ever since.</p>
<p>Some buckets (around 500) are broken when we start the Nautilus gateways so we currently have the gateways running with Mimic.</p>
<pre>
root@mon01:~# ceph versions
{
"mon": {
"ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)": 3
},
"osd": {
"ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)": 91
},
"mds": {},
"rgw": {
"ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)": 2,
"ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)": 1
},
"overall": {
"ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)": 2,
"ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)": 91,
"ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)": 7
}
}
root@mon01:~#
</pre>
<p>I started to investigate and I found one bucket called <strong>pbx</strong> which is experiencing these problems.</p>
<p>I spawned a Nautilus gateway on one of the Monitors and when I perform a query it fails:</p>
<pre>
curl -I -H 'Host: pbx.o.auroraobjects.eu' http://localhost:7480/
HTTP/1.1 400 Bad Request
Content-Length: 215
Bucket: pbx
x-amz-request-id: tx00000000000000000000f-005f802972-1a9fee9a-ams02
Accept-Ranges: bytes
Content-Type: application/xml
Date: Fri, 09 Oct 2020 09:12:18 GMT
Connection: Keep-Alive
</pre>
<p>This fails with the following message in the log:</p>
<pre>
2020-10-09 10:36:25.266 7f43c4a17700 0 req 14 0.000s NOTICE: invalid dest placement:
2020-10-09 10:36:25.266 7f43c4a17700 10 req 14 0.000s init_permissions on pbx[ams02.241978.4] failed, ret=-22
2020-10-09 10:36:25.266 7f43c4a17700 20 op->ERRORHANDLER: err_no=-22 new_err_no=-22
2020-10-09 10:36:25.266 7f43c4a17700 2 req 14 0.000s s3:stat_bucket op status=0
2020-10-09 10:36:25.266 7f43c4a17700 2 req 14 0.000s s3:stat_bucket http status=400
2020-10-09 10:36:25.266 7f43c4a17700 1 ====== req done req=0x560e46a70930 op status=0 http_status=400 latency=0s ======
</pre>
<p>The <strong>invalid dest placement</strong> comes from <strong>rgw_op.cc</strong>:</p>
<pre>
/* init dest placement -- only if bucket exists, otherwise request is either not relevant, or
* it's a create_bucket request, in which case the op will deal with the placement later */
if (s->bucket_exists) {
s->dest_placement.storage_class = s->info.storage_class;
s->dest_placement.inherit_from(s->bucket_info.placement_rule);
if (!store->svc.zone->get_zone_params().valid_placement(s->dest_placement)) {
ldpp_dout(s, 0) << "NOTICE: invalid dest placement: " << s->dest_placement.to_str() << dendl;
return -EINVAL;
}
}
</pre>
<p>This bucket indeed has no placement rule attached and only has an explicit placement set:</p>
<pre>
root@mon01:~# radosgw-admin metadata get bucket.instance:pbx:ams02.241978.4
{
"key": "bucket.instance:pbx:ams02.241978.4",
"ver": {
"tag": "_Vucn8pfwzF-qGg9bqip5lM_",
"ver": 1
},
"mtime": "2019-04-16 21:07:48.841780Z",
"data": {
"bucket_info": {
"bucket": {
"name": "pbx",
"marker": "ams02.241978.4",
"bucket_id": "ams02.241978.4",
"tenant": "",
"explicit_placement": {
"data_pool": ".rgw.buckets",
"data_extra_pool": "",
"index_pool": ".rgw.buckets"
}
},
"creation_time": "2014-02-16 12:32:15.000000Z",
"owner": "vdvm",
"flags": 0,
"zonegroup": "eu",
"placement_rule": "",
"has_instance_obj": "true",
"quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"num_shards": 0,
"bi_shard_hash_type": 0,
"requester_pays": "false",
"has_website": "false",
"swift_versioning": "false",
"swift_ver_location": "",
"index_type": 0,
"mdsearch_config": [],
"reshard_status": 0,
"new_bucket_instance_id": ""
},
"attrs": [
{
"key": "user.rgw.acl",
"val": "AgJ3AAAAAgISAAAABAAAAHZkdm0GAAAAQXR0aWxhAwNZAAAAAQEAAAAEAAAAdmR2bQ8AAAABAAAABAAAAHZkdm0DAzIAAAACAgQAAAAAAAAABAAAAHZkdm0AAAAAAAAAAAICBAAAAA8AAAAGAAAAQXR0aWxhAAAAAAAAAAA="
}
]
}
}
</pre>
<p>Notice that <strong>placement_rule</strong> is empty.</p>
<p>So I attempted to create a new placement rule called <strong>legacy-placement</strong>, but that fails:</p>
<pre>
{
"index_pool": ".rgw.buckets",
"storage_classes": {
"STANDARD": {
"data_pool": ".rgw.buckets"
}
},
"data_extra_pool": ".rgw.buckets.non-ec",
"index_type": 0
}
</pre>
<pre>
root@mon01:~# radosgw-admin zone placement add --placement-id=legacy-placement < legacy-placement.json
ERROR: index pool not configured, need to specify --index-pool
root@mon01:~#
</pre>
<p>I have also tried to reshard the bucket, but that did not resolve the issue.</p>
<p>On the mailinglist there is a thread from 2019: <a class="external" href="https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ULKK5RU2VXLFXNUJMZBMUG7CQ5UCWJCB/#R6CPZ2TEWRFL2JJWP7TT5GX7DPSV5S7Z">https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ULKK5RU2VXLFXNUJMZBMUG7CQ5UCWJCB/#R6CPZ2TEWRFL2JJWP7TT5GX7DPSV5S7Z</a></p>
<p>There people compiled RGW manually with lines commented. I'm trying to stay away from that as will probably have other users running into this as well.</p>
<p>Updating the bucket.instance metadata is not possible: (note that the ID changed due to the reshard)</p>
<pre>
radosgw-admin metadata put bucket.instance:pbx:ams02.446941181.1 < pbx_instance.json
</pre>
<p>In this JSON I manually set <strong>placement_rule</strong> to <strong>default-placement</strong>, but such an update is ignored and thus I can't fix it.</p> rgw - Bug #22928: RGW will not list contents of Jewel-era buckets: reshard does NOT fixhttps://tracker.ceph.com/issues/22928?journal_id=1770232020-10-09T12:45:24ZWido den Hollanderwido@42on.com
<ul></ul><p>I looked into the differences between Mimic and Nautilus and I found this commit from Yehuda: <a class="external" href="https://github.com/ceph/ceph/commit/2a8e8a98d8c56cc374ec671846a20e2b0484bc75">https://github.com/ceph/ceph/commit/2a8e8a98d8c56cc374ec671846a20e2b0484bc75</a></p>
<p>This breaks the buckets ability to work as the placement_rule (null) for this bucket doesn't match any of the placement policies for the zone.</p>
<p>Now the question is:</p>
<p>- Do we modify this if-statement<br />- Do we enhance 'bucket check --fix' so that it fixes the placement policy</p>