Project

General

Profile

Bug #55333

mgr/dashboard: Prometheus alertmanager reports msg="Error on notify" Post https://xxxx:8444/api/prometheus_receiver: x509: cannot validate certificate for XXX because it doesn't contain any IP SANs"

Added by Francesco Pantano almost 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Component - Orchestrator
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Pacific
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

in a cephadm deployed environment, when the monitoring stack is added we see a lot of:

Nov 05 11:49:53 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:49:53.686527006Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509:cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"                                                                                                                                                           
Nov 05 11:49:53 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:49:53.686628876Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"                                                                                                                                           
Nov 05 11:50:18 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:18.686873915Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused"                                                                                                                                                                                           
Nov 05 11:50:18 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:18.686975115Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"                                                                                                                                                           
Nov 05 11:50:18 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:18.687010806Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"                                                                                                                                                                                           
Nov 05 11:50:18 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:18.687028376Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"                                                                                                                                           Nov 05 11:50:43 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:43.687223073Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dialtcp 192.168.24.24:8444: connect: connection refused"                                                                                                                                                                                           Nov 05 11:50:43 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:43.687327773Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509:cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs" 
Nov 05 11:50:43 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:43.687328453Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"                                                                                                                                                                                           
Nov 05 11:50:43 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:50:43.687446324Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"                                                                                                                                           
Nov 05 11:51:08 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:08.687606561Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dialtcp 192.168.24.24:8444: connect: connection refused"                                                                                                                                                                                          
Nov 05 11:51:08 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:08.687697712Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509:cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"                                                                                                                                                           
Nov 05 11:51:08 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:08.687645482Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dialtcp 192.168.24.12:8444: connect: connection refused" 
Nov 05 11:51:08 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:08.687745412Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"                                                                                                                                           Nov 05 11:51:33 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:33.68792952Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs" 
Nov 05 11:51:33 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:33.68795854Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused"                                                                                                                                                                                            
Nov 05 11:51:33 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:33.68798964Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"                                                                                                                                                                                            
Nov 05 11:51:33 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:33.688091751Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs; Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused" 
Nov 05 11:51:58 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:58.688285838Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused"                                                                                                                                                                                           Nov 05 11:51:58 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:58.688392268Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused"                                                                                                                                                                                           
Nov 05 11:51:58 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:58.688411258Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs" 
Nov 05 11:51:58 oc0-controller-1.mydomain.tld conmon[137764]: level=error ts=2021-11-05T11:51:58.688428608Z caller=dispatch.go:177 component=dispatcher msg="Notify for alerts failed" num_alerts=27 err="Post https://192.168.24.12:8444/api/prometheus_receiver: dial tcp 192.168.24.12:8444: connect: connection refused; Post https://192.168.24.24:8444/api/prometheus_receiver: dial tcp 192.168.24.24:8444: connect: connection refused; Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs" 

coming from the alertmanager.

I see a problem with this line:

Post https://172.16.11.72:8444/api/prometheus_receiver: x509: cannot validate certificate for 172.16.11.72 because it doesn't contain any IP SANs"

and I think this happens when hostnames are used to issue the per-network certificates.
In addition I see:

[root@oc0-controller-0 specs]# cat alertmanager
---                                       
networks:                                 
- 172.16.11.0/24                          
placement:                                
  hosts:                                  
  - oc0-controller-0.mydomain.tld         
  - oc0-controller-1.mydomain.tld         
  - oc0-controller-2.mydomain.tld         
service_id: alertmanager                  
service_name: alertmanager                
service_type: alertmanager                
spec:                                     
  port: 9093

# This file is generated by cephadm.      
# See https://prometheus.io/docs/alerting/
global:                                   
  resolve_timeout: 5m                     
route:                                    
  receiver: 'default'                     
  routes:                                 
    - group_by: ['alertname']             
      group_wait: 10s                     
      group_interval: 10s                 
      repeat_interval: 1h                 
      receiver: 'ceph-dashboard'          
receivers:                                
- name: 'default'                                                                                                                                                                                                                              
  webhook_configs:                                                                                                                                                                                                                             
- name: 'ceph-dashboard'                                                                                                                                                                                                                       
  webhook_configs:                                                                                                                                                                                                                             
  - url: 'https://172.16.11.72:8444/api/prometheus_receiver'                                                                                                                                                                                   
  - url: 'https://192.168.24.12:8444/api/prometheus_receiver'                                                                                                                                                                                  
  - url: 'https://192.168.24.24:8444/api/prometheus_receiver'

and the problem is in the webhook config that points to 3 (wrong) ceph_dashboard backend instances instead of using the haproxy frontend.

History

#1 Updated by Ernesto Puerta almost 2 years ago

  • Subject changed from mgr/dashboard: short_description to mgr/dashboard: Prometheus alertmanager reports msg="Error on notify" Post https://xxxx:8444/api/prometheus_receiver: x509: cannot validate certificate for XXX because it doesn't contain any IP SANs"
  • Description updated (diff)

#2 Updated by Ernesto Puerta almost 2 years ago

  • Description updated (diff)

#3 Updated by Avan Thakkar over 1 year ago

  • Status changed from New to Closed

Seems like this issue is already resolved here https://github.com/ceph/ceph/pull/45860

Also available in: Atom PDF