Bug #46846
closedPrometheus metrics contain stripped/incomplete ipv6 address
0%
Description
curl --silent http://localhost:9283/metrics | grep ceph_mon_metadata{ ceph_mon_metadata{ceph_daemon="mon.mon2",hostname="mon2.example.net",public_addr="[2001",rank="0",ceph_version="ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)"} 1.0 ceph_mon_metadata{ceph_daemon="mon.mon3",hostname="mon3.example.net",public_addr="[2001",rank="1",ceph_version="ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)"} 1.0 ceph_mon_metadata{ceph_daemon="mon.mon1",hostname="mon1.example.net",public_addr="[2001",rank="2",ceph_version="ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)"} 1.0
The public_addr is stripped on the first semicolon.
Problem also exists on the `ceph_osd_metadata` metric for the `cluster_addr` label
ceph_osd_metadata{back_iface="bond-storage",ceph_daemon="osd.0",cluster_addr="[2001",device_class="ssd",front_iface="bond-mgmt",hostname="node1.example.net",objectstore="bluestore",public_addr="[2001",ceph_version="ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)"} 1.0 ceph_osd_metadata{back_iface="bond-storage",ceph_daemon="osd.1",cluster_addr="[2001",device_class="ssd",front_iface="bond-mgmt",hostname="node1.example.net",objectstore="bluestore",public_addr="[2001",ceph_version="ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)"} 1.0 ceph_osd_metadata{back_iface="bond-storage",ceph_daemon="osd.2",cluster_addr="[2001",device_class="ssd",front_iface="bond-mgmt",hostname="node1.example.net",objectstore="bluestore",public_addr="[2001",ceph_version="ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)"} 1.0
Unfortunately I do not use the ganesha nfs/cephfs/rgw features of ceph but it would not surprise me if those metrics contained the same bug on an ipv6 cluster
Updated by Matthew Oliver over 3 years ago
Looks like a splitting issue to me, not sure this is actaully messenger related, it's prometheus module related. After a quick look, we probably just need to use `rsplit` rather then the normal split to make this work in the python code, ie something like:
diff --git a/src/pybind/mgr/prometheus/module.py b/src/pybind/mgr/prometheus/module.py
index 83fe6c3af0..e9c734b8f8 100644
--- a/src/pybind/mgr/prometheus/module.py
+++ b/src/pybind/mgr/prometheus/module.py
@@ -510,7 +510,7 @@ class Module(MgrModule):
host_version = servers.get((id_, 'mon'), ('', ''))
self.metrics['mon_metadata'].set(1, (
'mon.{}'.format(id_), host_version[0],
- mon['public_addr'].split(':')[0], rank,
+ mon['public_addr'].rsplit(':', 1)[0], rank,
host_version[1]
))
in_quorum = int(rank in mon_status['quorum'])
@@ -619,8 +619,8 @@ class Module(MgrModule):
# id can be used to link osd metrics and metadata
id_ = osd['osd']
# collect osd metadata
- p_addr = osd['public_addr'].split(':')[0]
- c_addr = osd['cluster_addr'].split(':')[0]
+ p_addr = osd['public_addr'].rsplit(':', 1)[0]
+ c_addr = osd['cluster_addr'].rsplit(':', 1)[0]
if p_addr == "-" or c_addr == "-":
self.log.info(
"Missing address metadata for osd {0}, skipping occupation"
Updated by Matthew Oliver over 3 years ago
- Pull request ID set to 36594
Pushed this diff up as a PR. It would be great if it could be tested as I don't have a prometheus env setup. Though I guess I could go down that rabbit hole.
Updated by Nathan Cutler over 3 years ago
- Status changed from New to Fix Under Review
- Assignee set to Matthew Oliver
Updated by Nathan Cutler over 3 years ago
- Project changed from Messengers to mgr
Assuming it was filed under "Messengers" by mistake.
Updated by Jan Fajerski over 3 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport changed from octopus to octopus,nautilus
Updated by Nathan Cutler over 3 years ago
- Copied to Backport #47281: nautilus: Prometheus metrics contain stripped/incomplete ipv6 address added
Updated by Nathan Cutler over 3 years ago
- Copied to Backport #47282: octopus: Prometheus metrics contain stripped/incomplete ipv6 address added
Updated by Nathan Cutler over 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".