Project

General

Profile

Bug #56401

mgr/dashboard: alert redirect from passive to active mgr instance doesn't work properly

Added by Tatjana Dehler almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Category:
prometheus module
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Lets consider a cluster with 3 mgr nodes (node1, node2, node3) and the following Alertmanager configuration:

name: ceph-dashboard

  webhook_configs:

    send_resolved: true
    http_config: {}
    url: https://node1:8443/api/prometheus_receiver
    max_alerts: 0

    send_resolved: true
    http_config: {}
    url: https://node2:8443/api/prometheus_receiver
    max_alerts: 0

    send_resolved: true
    http_config: {}
    url: https://node3:8443/api/prometheus_receiver
    max_alerts: 0

All three dashboard endpoints (node1, node2, node3) need to be listed in the configuration, because the Alertmanager can't know which of them is the active instance.

node1 is the active one, node2 and node3 are both passive.

In case of an alert, it will be send to node1, node2 and node3 by the Alertmanager. As node1 is the active one, the dashboard will receive and display it. node2 and node3 are going to try to redirect the alert to the active instance. Unfortunately they're going to redirect the alert (from https://node2:8443/api/prometheus_receiver and https://node3:8443/api/prometheus_receiver) to https://node1:8443. While investigating the issue it turned out the redirection uses the correct URL (including /api/prometheus_receiver).
Also, it doesn't seem like the redirect uses the hostnames (node1, node2, node3). Unfortunately it uses the IP-addresses instead.

Possibly we can make use of `follow_redirects` https://prometheus.io/docs/alerting/latest/configuration/#http_config to improve the situation. While investigating the issue it turned out `follow_redirects` doesn't help. The Alertmanager will then write "notify retry canceled due to unrecoverable error after 1 attempts: unexpected status code 303" into the logfile.


Related issues

Copied to mgr - Backport #56593: quincy: mgr/dashboard: alert redirect from passive to active mgr instance doesn't work properly Resolved
Copied to mgr - Backport #56594: pacific: mgr/dashboard: alert redirect from passive to active mgr instance doesn't work properly Resolved

History

#1 Updated by Tatjana Dehler almost 2 years ago

  • Description updated (diff)

#2 Updated by Tatjana Dehler over 1 year ago

  • Description updated (diff)

#3 Updated by Tatjana Dehler over 1 year ago

  • Pull request ID set to 47011

#4 Updated by Volker Theile over 1 year ago

  • Status changed from New to Fix Under Review

#5 Updated by Tatjana Dehler over 1 year ago

  • Backport set to pacific quincy

#6 Updated by Tatjana Dehler over 1 year ago

  • Status changed from Fix Under Review to Pending Backport

#7 Updated by Backport Bot over 1 year ago

  • Copied to Backport #56593: quincy: mgr/dashboard: alert redirect from passive to active mgr instance doesn't work properly added

#8 Updated by Backport Bot over 1 year ago

  • Copied to Backport #56594: pacific: mgr/dashboard: alert redirect from passive to active mgr instance doesn't work properly added

#9 Updated by Nizamudeen A over 1 year ago

  • Status changed from Pending Backport to Resolved
  • Assignee set to Tatjana Dehler

Also available in: Atom PDF