Bug #55333
mgr/dashboard: Prometheus alertmanager reports msg="Error on notify" Post https://xxxx:8444/api/prometheus_receiver: x509: cannot validate certificate for XXX because it doesn't contain any IP SANs"
Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Component - Orchestrator
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Pacific
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
in a cephadm deployed environment, when the monitoring stack is added we see a lot of:
Nov 05 11:49:53 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:49:53.686527006Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509:cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
Nov 05 11:49:53 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:49:53.686628876Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"
Nov 05 11:50:18 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:18.686873915Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused"
Nov 05 11:50:18 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:18.686975115Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
Nov 05 11:50:18 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:18.687010806Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"
Nov 05 11:50:18 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:18.687028376Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused" Nov 05 11:50:43 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:43.687223073Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dialtcp 192.168.24.24:8444: connect: connection refused" Nov 05 11:50:43 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:43.687327773Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509:cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
Nov 05 11:50:43 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:43.687328453Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"
Nov 05 11:50:43 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:43.687446324Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
Nov 05 11:51:08 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:08.687606561Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dialtcp 192.168.24.24:8444: connect: connection refused"
Nov 05 11:51:08 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:08.687697712Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509:cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
Nov 05 11:51:08 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:08.687645482Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dialtcp 192.168.24.12:8444: connect: connection refused"
Nov 05 11:51:08 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:08.687745412Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs" Nov 05 11:51:33 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:33.68792952Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
Nov 05 11:51:33 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:33.68795854Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused"
Nov 05 11:51:33 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:33.68798964Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"
Nov 05 11:51:33 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:33.688091751Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs; Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"
Nov 05 11:51:58 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:58.688285838Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused" Nov 05 11:51:58 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:58.688392268Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused"
Nov 05 11:51:58 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:58.688411258Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
Nov 05 11:51:58 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:58.688428608Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused; Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
coming from the alertmanager.
I see a problem with this line:
Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
and I think this happens when hostnames are used to issue the per-network certificates.
In addition I see:
[root@oc0-controller-0 specs]# cat alertmanager
---
networks:
- 172.16.11.0/24
placement:
hosts:
- oc0-controller-0.mydomain.tld
- oc0-controller-1.mydomain.tld
- oc0-controller-2.mydomain.tld
service_id: alertmanager
service_name: alertmanager
service_type: alertmanager
spec:
port: 9093
# This file is generated by cephadm.
# See https://prometheus.io/docs/alerting/
global:
resolve_timeout: 5m
route:
receiver: 'default'
routes:
- group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'ceph-dashboard'
receivers:
- name: 'default'
webhook_configs:
- name: 'ceph-dashboard'
webhook_configs:
- url: 'https://172.16.11.72:8444/api/prometheus_receiver'
- url: 'https://192.168.24.12:8444/api/prometheus_receiver'
- url: 'https://192.168.24.24:8444/api/prometheus_receiver'
and the problem is in the webhook config that points to 3 (wrong) ceph_dashboard backend instances instead of using the haproxy frontend.
History
#1 Updated by Ernesto Puerta over 1 year ago
- Subject changed from mgr/dashboard: short_description to mgr/dashboard: Prometheus alertmanager reports msg="Error on notify" Post https://xxxx:8444/api/prometheus_receiver: x509: cannot validate certificate for XXX because it doesn't contain any IP SANs"
- Description updated (diff)
#2 Updated by Ernesto Puerta over 1 year ago
- Description updated (diff)
#3 Updated by Avan Thakkar about 1 year ago
- Status changed from New to Closed
Seems like this issue is already resolved here https://github.com/ceph/ceph/pull/45860