Bug #55333
mgr/dashboard: Prometheus alertmanager reports msg="Error on notify" Post https://xxxx:8444/api/prometheus_receiver: x509: cannot validate certificate for XXX because it doesn't contain any IP SANs"
Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Component - Orchestrator
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Pacific
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
in a cephadm deployed environment, when the monitoring stack is added we see a lot of:
Nov 05 11:49:53 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:49:53.686527006Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509:cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
Nov 05 11:49:53 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:49:53.686628876Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"
Nov 05 11:50:18 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:18.686873915Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused"
Nov 05 11:50:18 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:18.686975115Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
Nov 05 11:50:18 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:18.687010806Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"
Nov 05 11:50:18 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:18.687028376Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused" Nov 05 11:50:43 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:43.687223073Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dialtcp 192.168.24.24:8444: connect: connection refused" Nov 05 11:50:43 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:43.687327773Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509:cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
Nov 05 11:50:43 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:43.687328453Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"
Nov 05 11:50:43 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:43.687446324Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
Nov 05 11:51:08 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:08.687606561Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dialtcp 192.168.24.24:8444: connect: connection refused"
Nov 05 11:51:08 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:08.687697712Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509:cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
Nov 05 11:51:08 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:08.687645482Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dialtcp 192.168.24.12:8444: connect: connection refused"
Nov 05 11:51:08 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:08.687745412Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs" Nov 05 11:51:33 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:33.68792952Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
Nov 05 11:51:33 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:33.68795854Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused"
Nov 05 11:51:33 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:33.68798964Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"
Nov 05 11:51:33 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:33.688091751Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs; Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"
Nov 05 11:51:58 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:58.688285838Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused" Nov 05 11:51:58 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:58.688392268Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused"
Nov 05 11:51:58 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:58.688411258Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
Nov 05 11:51:58 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:58.688428608Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused; Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
coming from the alertmanager.
I see a problem with this line:
Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"
and I think this happens when hostnames are used to issue the per-network certificates.
In addition I see:
[root@oc0-controller-0 specs]# cat alertmanager --- networks: - 172.16.11.0/24 placement: hosts: - oc0-controller-0.mydomain.tld - oc0-controller-1.mydomain.tld - oc0-controller-2.mydomain.tld service_id: alertmanager service_name: alertmanager service_type: alertmanager spec: port: 9093 # This file is generated by cephadm. # See https://prometheus.io/docs/alerting/ global: resolve_timeout: 5m route: receiver: 'default' routes: - group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: 'ceph-dashboard' receivers: - name: 'default' webhook_configs: - name: 'ceph-dashboard' webhook_configs: - url: 'https://172.16.11.72:8444/api/prometheus_receiver' - url: 'https://192.168.24.12:8444/api/prometheus_receiver' - url: 'https://192.168.24.24:8444/api/prometheus_receiver'
and the problem is in the webhook config that points to 3 (wrong) ceph_dashboard backend instances instead of using the haproxy frontend.
History
#1 Updated by Ernesto Puerta over 1 year ago
- Subject changed from mgr/dashboard: short_description to mgr/dashboard: Prometheus alertmanager reports msg="Error on notify" Post https://xxxx:8444/api/prometheus_receiver: x509: cannot validate certificate for XXX because it doesn't contain any IP SANs"
- Description updated (diff)
#2 Updated by Ernesto Puerta over 1 year ago
- Description updated (diff)
#3 Updated by Avan Thakkar about 1 year ago
- Status changed from New to Closed
Seems like this issue is already resolved here https://github.com/ceph/ceph/pull/45860