Project

General

Profile

Bug #48358

rgw: qlen and qactive perf counters leak

Added by Dan van der Ster over 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In our environment the rgw qlen and qactive perf counters seem to trend slowly upwards. See the plot attached.
I suspect there is a case where the client IO is completed without the qlen/qactive counters getting decremented.

For context, we are trying to see if rgw_max_concurrent_requests can be tuned down to limit the peak rgw memory usage. So we want to monitor how many existing concurrent IOs we have in prod, but clearly this qlen counter isn't reliable for that. We'll send a separate PR to expose the throttle `outstanding_requests` values in a new perf counter to solve this separately, but maybe the qlen leak is obvious to someone?

Screenshot-20201125134722-782x407.png View (56.4 KB) Dan van der Ster, 11/25/2020 12:47 PM

Ceph - RGW metrics - Grafana 2021-05-11 11-27-50.png View (226 KB) Aleksandr Rudenko, 05/11/2021 08:43 AM

History

#1 Updated by Dan van der Ster over 2 years ago

We'll send a separate PR to expose the throttle `outstanding_requests` values in a new perf counter

https://github.com/ceph/ceph/pull/38283

#2 Updated by Mark Kogan over 2 years ago

  • Assignee set to Mark Kogan

#3 Updated by Aleksandr Rudenko almost 2 years ago

We can see same behavior on 14.2.15.

Also available in: Atom PDF