Bug #59069: Prometheus module crash when pool is EC with technique=reed_sol_r6_op - Ceph - Ceph

Actions

Copy link

Bug #59069

open

Prometheus module crash when pool is EC with technique=reed_sol_r6_op

Added by Benjamin Mare about 1 year ago. Updated 11 months ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Monitoring/Alerting

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Yes

Severity:

3 - minor

Reviewed:

Affected Versions:

v17.2.6

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Dear maintainer,

When trying to activate the module prometheus on a Quincy cluster, you get the following error in the MGR logs:


2023-03-14T02:25:12.420+0000 7fa79c883700  0 [prometheus ERROR root] failed to collect metrics:
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/prometheus/module.py", line 508, in collect
    data = self.mod.collect()
  File "/usr/share/ceph/mgr/mgr_util.py", line 806, in wrapper
    result = f(*args, **kwargs)
  File "/usr/share/ceph/mgr/prometheus/module.py", line 1578, in collect
    self.get_metadata_and_osd_status()
  File "/usr/share/ceph/mgr/mgr_util.py", line 806, in wrapper
    result = f(*args, **kwargs)
  File "/usr/share/ceph/mgr/prometheus/module.py", line 1220, in get_metadata_and_osd_status
    pool_type, pool_description = _get_pool_info(pool)
  File "/usr/share/ceph/mgr/prometheus/module.py", line 1211, in _get_pool_info
    description = f"ec:{profile['k']}+{profile['m']}" 
KeyError: 'm'

When trying to get informations about the pools you're trying to get the "m" and the "k" of the erasure coding profile. But when using "technique=reed_sol_r6_op" the "m" isn't visible and is implied.


$ ceph osd erasure-code-profile get huitetdeux
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
k=8
plugin=jerasure
technique=reed_sol_r6_op
w=8

I think prometheus is facing this issue.

In Octopus I wasn't facing this issue and my monitoring stack was fully functionnal.

It's really annoying because I can't get any metrics about my cluster inside Grafana, which is very helpfull for the cluster maintenance.

Thanks.

Actions

Copy link

Updated by Ilya Dryomov 11 months ago

Target version deleted (~~v17.2.6~~)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #59069

Prometheus module crash when pool is EC with technique=reed_sol_r6_op

Updated by Ilya Dryomov 11 months ago