https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2020-11-25T13:38:46ZCeph rgw - Bug #48358: rgw: qlen and qactive perf counters leakhttps://tracker.ceph.com/issues/48358?journal_id=1800852020-11-25T13:38:46ZDan van der Ster
<ul></ul><blockquote>
<p>We'll send a separate PR to expose the throttle `outstanding_requests` values in a new perf counter</p>
</blockquote>
<p><a class="external" href="https://github.com/ceph/ceph/pull/38283">https://github.com/ceph/ceph/pull/38283</a></p> rgw - Bug #48358: rgw: qlen and qactive perf counters leakhttps://tracker.ceph.com/issues/48358?journal_id=1810722020-12-10T15:28:52ZMark Koganmkogan@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Mark Kogan</i></li></ul> rgw - Bug #48358: rgw: qlen and qactive perf counters leakhttps://tracker.ceph.com/issues/48358?journal_id=1946942021-05-11T08:43:52ZAleksandr Rudenko
<ul><li><strong>File</strong> <a href="/attachments/download/5491/Ceph%20-%20RGW%20metrics%20-%20Grafana%202021-05-11%2011-27-50.png">Ceph - RGW metrics - Grafana 2021-05-11 11-27-50.png</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/5491/Ceph%20-%20RGW%20metrics%20-%20Grafana%202021-05-11%2011-27-50.png">View</a> added</li></ul><p>We can see same behavior on 14.2.15.</p> rgw - Bug #48358: rgw: qlen and qactive perf counters leakhttps://tracker.ceph.com/issues/48358?journal_id=2538562024-01-31T14:29:52ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Duplicated by</strong> <i><a class="issue tracker-1 status-10 priority-4 priority-default closed" href="/issues/61338">Bug #61338</a>: rgw: qactive perf counter may leak on errors</i> added</li></ul> rgw - Bug #48358: rgw: qlen and qactive perf counters leakhttps://tracker.ceph.com/issues/48358?journal_id=2538572024-01-31T14:40:42ZCasey Bodleycbodley@redhat.com
<ul><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li></ul><p>i'm hearing reports that when these counters leak, rgw performance also degrades significantly until the process restarts. this is probably due to leaks of the counter associated with <code>rgw_max_concurrent_requests</code></p>
<p>to decrement the perf counters, we rely on a call to <code>ClientIO::complete_request()</code>: <a class="external" href="https://github.com/ceph/ceph/blob/f4758e5/src/rgw/rgw_asio_client.cc#L97-L100">https://github.com/ceph/ceph/blob/f4758e5/src/rgw/rgw_asio_client.cc#L97-L100</a></p>
<p>for <code>rgw_max_concurrent_requests</code>, we rely on a similar hook in <code>SimpleThrottler::request_complete()</code>: <a class="external" href="https://github.com/ceph/ceph/blob/f4758e5/src/rgw/rgw_dmclock_async_scheduler.h#L188-L193">https://github.com/ceph/ceph/blob/f4758e5/src/rgw/rgw_dmclock_async_scheduler.h#L188-L193</a></p>
<p>certain types of errors fail to call either function</p>
<p>raising priority since this effects more than just the output of metrics</p> rgw - Bug #48358: rgw: qlen and qactive perf counters leakhttps://tracker.ceph.com/issues/48358?journal_id=2538592024-01-31T14:42:37ZCasey Bodleycbodley@redhat.com
<ul></ul><p>from Andrea Bolzonella:</p>
<blockquote>
<p>After my analysis, I observed that whenever an error is raised in the rgw_rest.cc (line 630 in 18.2.1), the connection is closed, but the qlen is not decremented.</p>
</blockquote>
<pre><code class="cpp syntaxhl"><span class="CodeRay"> <span class="keyword">try</span> {
RESTFUL_IO(s)->complete_header();
} <span class="keyword">catch</span> (rgw::io::Exception& e) {
ldpp_dout(s, <span class="integer">0</span>) << <span class="string"><span class="delimiter">"</span><span class="content">ERROR: RESTFUL_IO(s)->complete_header() returned err=</span><span class="delimiter">"</span></span>
<< e.what() << dendl;
}
</span></code></pre>