Bug #56401: mgr/dashboard: alert redirect from passive to active mgr instance doesn't work properly - mgr - Ceph

Actions

Copy link

Bug #56401

closed

mgr/dashboard: alert redirect from passive to active mgr instance doesn't work properly

Added by Tatjana Dehler almost 2 years ago. Updated almost 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Tatjana Dehler

Category:

prometheus module

Target version:

% Done:

Source:

Tags:

Backport:

pacific quincy

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

47011

Crash signature (v1):

Crash signature (v2):

Description

Lets consider a cluster with 3 mgr nodes (node1, node2, node3) and the following Alertmanager configuration:

name: ceph-dashboard

  webhook_configs:

    send_resolved: true
    http_config: {}
    url: https://node1:8443/api/prometheus_receiver
    max_alerts: 0

    send_resolved: true
    http_config: {}
    url: https://node2:8443/api/prometheus_receiver
    max_alerts: 0

    send_resolved: true
    http_config: {}
    url: https://node3:8443/api/prometheus_receiver
    max_alerts: 0

All three dashboard endpoints (node1, node2, node3) need to be listed in the configuration, because the Alertmanager can't know which of them is the active instance.

node1 is the active one, node2 and node3 are both passive.

In case of an alert, it will be send to node1, node2 and node3 by the Alertmanager. As node1 is the active one, the dashboard will receive and display it. node2 and node3 are going to try to redirect the alert to the active instance. ~~Unfortunately they're going to redirect the alert (from https://node2:8443/api/prometheus_receiver and https://node3:8443/api/prometheus_receiver) to https://node1:8443.~~ While investigating the issue it turned out the redirection uses the correct URL (including /api/prometheus_receiver).
Also, it doesn't seem like the redirect uses the hostnames (node1, node2, node3). Unfortunately it uses the IP-addresses instead.

~~Possibly we can make use of `follow_redirects` https://prometheus.io/docs/alerting/latest/configuration/#http_config to improve the situation.~~ While investigating the issue it turned out `follow_redirects` doesn't help. The Alertmanager will then write "notify retry canceled due to unrecoverable error after 1 attempts: unexpected status code 303" into the logfile.

Related issues 2 (0 open — 2 closed)