Project

General

Profile

Actions

Bug #56508

closed

haproxy check fails for ceph-grafana service

Added by Francesco Pantano almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Category:
cephadm
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy, pacific
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If OSP is deployed with ceph-dashboard there are multiple ceph-dashboard services deployed and place behind haproxy, one of the services is grafana.

The following haproxy configuration is generated for grafana on OSP:

listen ceph_grafana
bind 192.168.24.71:3100 transparent ssl crt /etc/pki/tls/certs/haproxy/overcloud-haproxy-storage.pem
mode http
balance source
http-request set-header X-Forwarded-Proto https if { ssl_fc }
http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
http-request set-header X-Forwarded-Port %[dst_port]
option httpchk HEAD /
option httplog
option forwardfor
server central-controller-0.storage.redhat.local 172.23.1.55:3100 ca-file /etc/ipa/ca.crt check fall 5 inter 2000 rise 2 ssl verify required verifyhost central-controller-0.storage.redhat.local
server central-controller-1.storage.redhat.local 172.23.1.124:3100 ca-file /etc/ipa/ca.crt check fall 5 inter 2000 rise 2 ssl verify required verifyhost central-controller-1.storage.redhat.local
server central-controller-2.storage.redhat.local 172.23.1.243:3100 ca-file /etc/ipa/ca.crt check fall 5 inter 2000 rise 2 ssl verify required verifyhost central-controller-2.storage.redhat.local

The haproxy configuration for grafana service seems to be correct and haproxy does service backend checks regularly.

The problem seems to be that the check fails, the grafana service complains every 2 seconds about:

2022/06/21 12:36:00 http: TLS handshake error from 172.23.1.243:56364: remote error: tls: internal error
2022/06/21 12:36:00 http: TLS handshake error from 172.23.1.55:52296: remote error: tls: internal error
2022/06/21 12:36:01 http: TLS handshake error from 172.23.1.124:52898: remote error: tls: internal error

The reason is that all the grafana server containers on all the controller nodes (in my case grafana is deployed on controllers) have the same SSL certificate and key deployed in /etc/grafana/certs/cert_file|key,

The haproxy check is successful to grafana on controller-0 but fails to the other grafana backends because the grafana containers have the same certificate generated for controller-0 deployed in /etc/grafana/certs/cert_file|key.

The container's file /etc/grafana/certs/cert_file are bind to /var/lib/ceph/d5c621ae-ec54-5b9d-910d-b8dba8e6b5ba/grafana.central-controller-*/etc/grafana/certs/cert_key on the hosts and it's the same files on all the hosts but the certificates in /etc/pki/tls/certs/ceph_grafana.crt are different and correctly generated for each host.

If I copy /etc/pki/tls/certs/ceph_grafana.crt to /var/lib/ceph/d5c621ae-ec54-5b9d-910d-b8dba8e6b5ba/grafana.central-controller-*/etc/grafana/certs/cert_file and restart grafana containers on all hosts, The haproxy check starts to be successful.

This seems a side effect of the transitioning from ceph-ansible to cephadm: ceph-ansible used to configure the grafana containers via [1], and the template [2] reference the certificate generated for that node; also, the certificate was copied through [3], and /etc/grafana is mounted (-v /etc/grafana) when the container starts.
The above ensures the right certificate is always present in the current node where grafana is started.
However, cephadm is spec driven, and there's no logic to reference a diff certificate per instance because it's a config-key within the cluster [4], and it's global for all the grafana instances.
This is something that should be addressed by cephadm, just because you have the ability to deploy multiple grafana instances on multiple nodes, but not sure it's something currently supported.

[1] https://github.com/ceph/ceph-ansible/tree/main/roles/ceph-grafana
[2] https://github.com/ceph/ceph-ansible/blob/main/roles/ceph-grafana/templates/grafana.ini.j2#L19-L20
[3] https://github.com/ceph/ceph-ansible/blob/main/roles/ceph-grafana/tasks/configure_grafana.yml#L73-L95
[4] https://docs.ceph.com/en/latest/cephadm/services/monitoring/#configuring-ssl-tls-for-grafana


Related issues 3 (0 open3 closed)

Related to Orchestrator - Documentation #47637: mgr/cephadm: document how to configure custom TLS certificate for GrafanaResolved

Actions
Copied to Orchestrator - Backport #57383: quincy: haproxy check fails for ceph-grafana service ResolvedAdam KingActions
Copied to Orchestrator - Backport #57384: pacific: haproxy check fails for ceph-grafana service ResolvedAdam KingActions
Actions

Also available in: Atom PDF