Project

General

Profile

Actions

Bug #56401

closed

mgr/dashboard: alert redirect from passive to active mgr instance doesn't work properly

Added by Tatjana Dehler almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Category:
prometheus module
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Lets consider a cluster with 3 mgr nodes (node1, node2, node3) and the following Alertmanager configuration:

name: ceph-dashboard

  webhook_configs:

    send_resolved: true
    http_config: {}
    url: https://node1:8443/api/prometheus_receiver
    max_alerts: 0

    send_resolved: true
    http_config: {}
    url: https://node2:8443/api/prometheus_receiver
    max_alerts: 0

    send_resolved: true
    http_config: {}
    url: https://node3:8443/api/prometheus_receiver
    max_alerts: 0

All three dashboard endpoints (node1, node2, node3) need to be listed in the configuration, because the Alertmanager can't know which of them is the active instance.

node1 is the active one, node2 and node3 are both passive.

In case of an alert, it will be send to node1, node2 and node3 by the Alertmanager. As node1 is the active one, the dashboard will receive and display it. node2 and node3 are going to try to redirect the alert to the active instance. Unfortunately they're going to redirect the alert (from https://node2:8443/api/prometheus_receiver and https://node3:8443/api/prometheus_receiver) to https://node1:8443. While investigating the issue it turned out the redirection uses the correct URL (including /api/prometheus_receiver).
Also, it doesn't seem like the redirect uses the hostnames (node1, node2, node3). Unfortunately it uses the IP-addresses instead.

Possibly we can make use of `follow_redirects` https://prometheus.io/docs/alerting/latest/configuration/#http_config to improve the situation. While investigating the issue it turned out `follow_redirects` doesn't help. The Alertmanager will then write "notify retry canceled due to unrecoverable error after 1 attempts: unexpected status code 303" into the logfile.


Related issues 2 (0 open2 closed)

Copied to mgr - Backport #56593: quincy: mgr/dashboard: alert redirect from passive to active mgr instance doesn't work properlyResolvedTatjana DehlerActions
Copied to mgr - Backport #56594: pacific: mgr/dashboard: alert redirect from passive to active mgr instance doesn't work properlyResolvedTatjana DehlerActions
Actions #1

Updated by Tatjana Dehler almost 2 years ago

  • Description updated (diff)
Actions #2

Updated by Tatjana Dehler almost 2 years ago

  • Description updated (diff)
Actions #3

Updated by Tatjana Dehler almost 2 years ago

  • Pull request ID set to 47011
Actions #4

Updated by Volker Theile almost 2 years ago

  • Status changed from New to Fix Under Review
Actions #5

Updated by Tatjana Dehler almost 2 years ago

  • Backport set to pacific quincy
Actions #6

Updated by Tatjana Dehler almost 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #56593: quincy: mgr/dashboard: alert redirect from passive to active mgr instance doesn't work properly added
Actions #8

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #56594: pacific: mgr/dashboard: alert redirect from passive to active mgr instance doesn't work properly added
Actions #9

Updated by Nizamudeen A almost 2 years ago

  • Status changed from Pending Backport to Resolved
  • Assignee set to Tatjana Dehler
Actions

Also available in: Atom PDF