Project

General

Profile

Actions

Bug #59069

open

Prometheus module crash when pool is EC with technique=reed_sol_r6_op

Added by Benjamin Mare about 1 year ago. Updated 11 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Monitoring/Alerting
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Dear maintainer,

When trying to activate the module prometheus on a Quincy cluster, you get the following error in the MGR logs:


2023-03-14T02:25:12.420+0000 7fa79c883700  0 [prometheus ERROR root] failed to collect metrics:
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/prometheus/module.py", line 508, in collect
    data = self.mod.collect()
  File "/usr/share/ceph/mgr/mgr_util.py", line 806, in wrapper
    result = f(*args, **kwargs)
  File "/usr/share/ceph/mgr/prometheus/module.py", line 1578, in collect
    self.get_metadata_and_osd_status()
  File "/usr/share/ceph/mgr/mgr_util.py", line 806, in wrapper
    result = f(*args, **kwargs)
  File "/usr/share/ceph/mgr/prometheus/module.py", line 1220, in get_metadata_and_osd_status
    pool_type, pool_description = _get_pool_info(pool)
  File "/usr/share/ceph/mgr/prometheus/module.py", line 1211, in _get_pool_info
    description = f"ec:{profile['k']}+{profile['m']}" 
KeyError: 'm'

When trying to get informations about the pools you're trying to get the "m" and the "k" of the erasure coding profile. But when using "technique=reed_sol_r6_op" the "m" isn't visible and is implied.


$ ceph osd erasure-code-profile get huitetdeux
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
k=8
plugin=jerasure
technique=reed_sol_r6_op
w=8

I think prometheus is facing this issue.

In Octopus I wasn't facing this issue and my monitoring stack was fully functionnal.

It's really annoying because I can't get any metrics about my cluster inside Grafana, which is very helpfull for the cluster maintenance.

Thanks.

Actions #1

Updated by Ilya Dryomov 11 months ago

  • Target version deleted (v17.2.6)
Actions

Also available in: Atom PDF