Ceph : Issueshttps://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2024-03-05T09:29:40ZCeph
Redmine rgw - Bug #64710 (Pending Backport): kafka: RGW hangs when broker is down for no persistent notif...https://tracker.ceph.com/issues/647102024-03-05T09:29:40ZYuval Lifshitzyuvalif@yahoo.com
<p>based on this comment: <a class="external" href="https://github.com/ceph/ceph/pull/55051#issuecomment-1961950841">https://github.com/ceph/ceph/pull/55051#issuecomment-1961950841</a><br />the main issue is that the librdkafka timeout is set by defult to 5 minutes.<br />due to the 30 seconds idle timeout, we release the pending coroutines after 30 seconds.<br />but this is still causing the RGW to hang (as it exhaust all of its front-end connections).</p> rgw - Feature #64251 (Resolved): allow AWS lifecycle event types to configure lifecycle notificat...https://tracker.ceph.com/issues/642512024-01-30T14:43:51ZYuval Lifshitzyuvalif@yahoo.com
<p>in addition to the events we support, we should also allow the following AWS lifecycle event types:</p>
<pre>
s3:LifecycleExpiration:Delete -> s3:ObjectLifecycle:Expiration:Current
s3:LifecycleExpiration:DeleteMarkerCreated -> s3:ObjectLifecycle:Expiration:DeleteMarker
s3:LifecycleExpiration:* -> s3:LifecycleExpiration:Delete, s3:LifecycleExpiration:DeleteMarkerCreated
s3:LifecycleTransition -> s3:ObjectLifecycle:Transition:Current
</pre>
<p>note that the following don't have an AWS equivalent:<br /><pre>
s3:ObjectLifecycle:Expiration:NonCurrent
s3:ObjectLifecycle:Expiration:AbortMultipartUpload
s3:ObjectLifecycle:Transition:NonCurrent
</pre></p>
<p>note that we are stil going to support all existing events</p> rgw-testing - Bug #64026 (New): notifications: test_ps_s3_notification_push_amqp_on_master is fai...https://tracker.ceph.com/issues/640262024-01-15T11:39:21ZYuval Lifshitzyuvalif@yahoo.com
<p>the test is using the wrong amqp receiver to get the deletion notifications</p> rgw - Bug #63915 (Fix Under Review): kafka does not detect and propagete broker down statehttps://tracker.ceph.com/issues/639152024-01-02T10:01:09ZYuval Lifshitzyuvalif@yahoo.com
when broker is down it is expected to reply with an error immdeiatly.<br />currently, the errors are reported back to only after connection is marked idle, and being deleted.<br />since this timeout is only used to "garbage collect" decomissioned connections it is set to a high number, meaning that the reply that the notifications failed takes too long.
<ul>
<li>in case of persistent notifications, it will fill up memory with pending coroutines</li>
<li>in case of non-persistent notifications, it will block all frontend coroutines, making the RGW unusable</li>
</ul>
<p>in addition, any for non-persistent notifications, since this is detected in the "publish_commit()" stage, it will just be reported and there will be no feedback to the client that the notifications failed.<br />to solve that, the "publish_reserve()" step should be used to verify the connectivity to the broker</p> rgw - Bug #63909 (Fix Under Review): persistent topic stats test failshttps://tracker.ceph.com/issues/639092023-12-31T15:32:50ZYuval Lifshitzyuvalif@yahoo.com
<p>see comment: <a class="external" href="https://github.com/ceph/ceph/pull/54868#issuecomment-1872238973">https://github.com/ceph/ceph/pull/54868#issuecomment-1872238973</a><br />on how to reproduce the test.<br />could be a timing issue, since it passes when only "test_ps_s3_persistent_topic_stats" is run, and sometimes fail when executed with the entire suite</p> rgw - Feature #63901 (Fix Under Review): kafka: make connection idlness parameter configurablehttps://tracker.ceph.com/issues/639012023-12-28T18:24:43ZYuval Lifshitzyuvalif@yahoo.com
<p>keep default value as 30 sec</p> rgw-testing - Feature #63893 (New): support running notification tests on ubuntuhttps://tracker.ceph.com/issues/638932023-12-26T17:40:42ZYuval Lifshitzyuvalif@yahoo.com
<p>currently amqp tests are only supported on centos/rhel/fedora since qa/tasks/rabbitmq.py is rpm oriented.<br />file should add support according to the ubuntu instructions here:<br /><a class="external" href="https://www.rabbitmq.com/install-debian.html#apt-quick-start-cloudsmith">https://www.rabbitmq.com/install-debian.html#apt-quick-start-cloudsmith</a></p> rgw - Bug #63859 (Fix Under Review): notifications/lifecycle: failure to commit a notification sh...https://tracker.ceph.com/issues/638592023-12-19T17:28:57ZYuval Lifshitzyuvalif@yahoo.com
<p>since the actual lifecycle action cannot be rolled back, failure in "publish_commit()" should not be considered an error.<br />note that most issues with notifications (e.g. queue full) should be caught by "publish_reserve()" and fail the action before it happens</p> rgw - Bug #63855 (Fix Under Review): notifications: notification will be sent even if op has failedhttps://tracker.ceph.com/issues/638552023-12-19T15:56:54ZYuval Lifshitzyuvalif@yahoo.com
<p>publish_commit() should not be called if the actual operation has failed</p> rgw - Feature #63744 (New): support new notification event typeshttps://tracker.ceph.com/issues/637442023-12-06T18:50:36ZYuval Lifshitzyuvalif@yahoo.com
<p>see: <a class="external" href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-event-types-and-destinations.html">https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-event-types-and-destinations.html</a></p>
<p>following are new notification types:<br />s3:TestEvent - done at conf time, not sure if we want that<br />s3:ObjectTagging:*<br />s3:ObjectTagging:Put<br />s3:ObjectTagging:Delete<br />s3:ObjectAcl:Put</p>
<p>s3:IntelligentTiering - do we have something similar?</p>
<p>also, try to align out lifecycle events:<br />s3:ObjectLifecycle:Expiration:Current<br />s3:ObjectLifecycle:Expiration:NonCurrent<br />s3:ObjectLifecycle:Expiration:DeleteMarker<br />s3:ObjectLifecycle:Expiration:AbortMultipartUpload<br />s3:ObjectLifecycle:Transition:Current<br />s3:ObjectLifecycle:Transition:NonCurrent</p>
<p>with the AWS ones:<br />s3:LifecycleExpiration:*<br />s3:LifecycleExpiration:Delete<br />s3:LifecycleExpiration:DeleteMarkerCreated<br />s3:LifecycleTransition</p> rgw - Feature #63704 (Pending Backport): notifications: add observability to persistent notificat...https://tracker.ceph.com/issues/637042023-11-30T17:08:18ZYuval Lifshitzyuvalif@yahoo.com
<p>following PRs are implementing the feature:<br /><a class="external" href="https://github.com/ceph/ceph/pull/52087">https://github.com/ceph/ceph/pull/52087</a><br /><a class="external" href="https://github.com/ceph/ceph/pull/52439">https://github.com/ceph/ceph/pull/52439</a><br /><a class="external" href="https://github.com/ceph/ceph/pull/53039">https://github.com/ceph/ceph/pull/53039</a><br /><a class="external" href="https://github.com/ceph/ceph/pull/54459">https://github.com/ceph/ceph/pull/54459</a></p>
<p>to avoid issues and upgrade issues, all of the work above must be backported together</p> rgw - Feature #63695 (New): kafka: allow more than one broker in the bootsrap listhttps://tracker.ceph.com/issues/636952023-11-30T10:51:11ZYuval Lifshitzyuvalif@yahoo.com
<p>currently we provide the kafka endpoint in the topic configuration. but kafka allows for multiple bootstrap brokers to be used as the initial endpoint (if one of them is down).<br />this will be an addition to the topic creation REST API, and then pass these values to the librdkafka connection creation process</p> rgw - Feature #63641 (Closed): kafka: expose librdkafka retry parameters as conf parametershttps://tracker.ceph.com/issues/636412023-11-26T14:55:49ZYuval Lifshitzyuvalif@yahoo.com
<p>there are setups where sync notifications (non persistent) are required.<br />in these setups we need to detect that the kafka brokers are down much faster then by using the default parameters (that are tuned to overcome temporary network issues).<br />since the kafka connections are shared by mutiple topics, these parameters cannot be per topic and has to be global.</p> rgw-testing - Bug #63616 (Pending Backport): lua integration testshttps://tracker.ceph.com/issues/636162023-11-23T16:36:10ZYuval Lifshitzyuvalif@yahoo.comrgw - Bug #63580 (Pending Backport): notifications: sending notifications with multidelete is cau...https://tracker.ceph.com/issues/635802023-11-19T11:17:26ZYuval Lifshitzyuvalif@yahoo.com
<p>this is a regression from: <a class="external" href="https://github.com/ceph/ceph/commit/6b6592f50b6b81fa13a330bcb91273ba7f25c0c9">https://github.com/ceph/ceph/commit/6b6592f50b6b81fa13a330bcb91273ba7f25c0c9</a></p>