Actions
Bug #59069
openPrometheus module crash when pool is EC with technique=reed_sol_r6_op
Status:
New
Priority:
Normal
Assignee:
-
Category:
Monitoring/Alerting
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
Yes
Severity:
3 - minor
Reviewed:
Description
Dear maintainer,
When trying to activate the module prometheus on a Quincy cluster, you get the following error in the MGR logs:
2023-03-14T02:25:12.420+0000 7fa79c883700 0 [prometheus ERROR root] failed to collect metrics:
Traceback (most recent call last):
File "/usr/share/ceph/mgr/prometheus/module.py", line 508, in collect
data = self.mod.collect()
File "/usr/share/ceph/mgr/mgr_util.py", line 806, in wrapper
result = f(*args, **kwargs)
File "/usr/share/ceph/mgr/prometheus/module.py", line 1578, in collect
self.get_metadata_and_osd_status()
File "/usr/share/ceph/mgr/mgr_util.py", line 806, in wrapper
result = f(*args, **kwargs)
File "/usr/share/ceph/mgr/prometheus/module.py", line 1220, in get_metadata_and_osd_status
pool_type, pool_description = _get_pool_info(pool)
File "/usr/share/ceph/mgr/prometheus/module.py", line 1211, in _get_pool_info
description = f"ec:{profile['k']}+{profile['m']}"
KeyError: 'm'
When trying to get informations about the pools you're trying to get the "m" and the "k" of the erasure coding profile. But when using "technique=reed_sol_r6_op" the "m" isn't visible and is implied.
$ ceph osd erasure-code-profile get huitetdeux
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
k=8
plugin=jerasure
technique=reed_sol_r6_op
w=8
I think prometheus is facing this issue.
In Octopus I wasn't facing this issue and my monitoring stack was fully functionnal.
It's really annoying because I can't get any metrics about my cluster inside Grafana, which is very helpfull for the cluster maintenance.
Thanks.
Actions