Bug #24151: ceph-mgr have lost prio=0 perf counters? get_counter seem to ignore them - mgr - Ceph

Actions

Copy link

Bug #24151

closed

ceph-mgr have lost prio=0 perf counters? get_counter seem to ignore them

Added by Peter Gervai almost 6 years ago. Updated almost 6 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

ceph-mgr

Target version:

Ceph - v12.2.3

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

4 - irritation

Reviewed:

Affected Versions:

Ceph - v12.2.4

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This was observed on the prometheus module, but the problem seems to be general mgr one.

I have lost a lot of perf counters in the upgrade to 12.2.2+ I believe (but can't pin it since it's been a while), namely 'osd.n.bluestore.bluestore_compressed' was one which I have continuously monitored. It's still visible in both 'perf dump' and 'perf schema', and I see it's priority=0 (debug).

However I cannot seem to be able to pull them up no matter what:

                c = self.get_counter( service['type'], service['id'], "bluestore.bluestore_compressed" )

results empty, while, for example

                c = self.get_counter( service['type'], service['id'], "bluestore.submit_lat" )


gives the result.


	I wasn't able to figure out if prio gets filtered inside get_counter somehow, and if it does how to lift it. This breaks pretty lots of graphs (prometheus and else).

Actions

Copy link

Updated by John Spray almost 6 years ago

Performance counters are indeed filtered by priority, this is controlled by a ceph-mgr setting called mgr_stats_threshold

If you set it to zero then you'll get everything -- a pretty huge number of counters, but on a smaller cluster that won't hurt too badly.

Actions

Copy link

Updated by John Spray almost 6 years ago

Status changed from New to Closed

Actions

Copy link

Updated by Peter Gervai almost 6 years ago

Thanks!

This is dangerously underdocumented to the point that I don't even have an immediate idea how to set it (apart from guessing the GLOBAL section of the ceph.conf). I usually prefer issues to be converted to documentation problems when there exists a good, working but completely hidden answer. (Try to google for "mgr_stats_threshold" or "ceph-mgr setting" and you probably see what I mean: no nothing.) And this have changed behaviour between updates (and conversely stomped on lots of graphs which were collected but not anymore).

I am not sure whether it could be set in a mgr module, or is it a global-only flag. Or else. So I would rather prefer a few words about this entering the docs before closing this issue into oblivion. (Until then I'll try to guess how it ought to work.)

Actions

Copy link

Updated by John Spray almost 6 years ago

It might be a bit of an overstatement to call this dangerous -- data loss is dangerous, a hidden perf counter is annoying :-)

The reason you're not seeing the setting's documentation online is that it has a documentation string in the code, but unfortunately the work to generate the web docs from that metadata hasn't happened yet.

If you can work out a good place to add some words about this to the documentation then PRs are always welcome.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr

Custom queries

Bug #24151

ceph-mgr have lost prio=0 perf counters? get_counter seem to ignore them

Updated by John Spray almost 6 years ago

Updated by John Spray almost 6 years ago

Updated by Peter Gervai almost 6 years ago

Updated by John Spray almost 6 years ago