Project

General

Profile

Actions

Bug #55837

open

mgr/dashboard: After several days of not being used, Dashboard HTTPS website hangs during loading, with no errors

Added by Zach Heise almost 2 years ago. Updated over 1 year ago.

Status:
Need More Info
Priority:
Normal
Category:
General - Back-end
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
pacific quincy
Regression:
No
Severity:
4 - irritation
Reviewed:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Description of problem

I have noticed that every 2-3 days, in the morning when I start work, our ceph dashboard page does not respond in the browser. It works fine throughout the day, but it seems like after a certain unknown hours without anyone accessing it. Something must be going wrong with the dashboard module, or mgr daemon, because when I try to load (or refresh when it's already loaded) the ceph dashboard site, the browser just does the “throbber": spinning/loading on the page's favicon – no content on the page ever appears, no errors or anything. None of the buttons on the page load – nor time out and show a 404 – for example, Block\Images or Cluster\Hosts in the left sidebar will load, but show empty. And the throbber never stops.

I can easily fix it with ceph mgr module disable dashboard and then waiting 10 seconds, then ceph mgr module enable dashboard – this makes it start working again, until the next time I go a few days without using the dashboard, at which point I need to do the same process again. So this is an 'irritation' level bug.

Environment

  • ceph version string: 16.2.4 through 16.2.7
  • Platform (OS/distro/release): Linux/Alma/8.5 Kernel is 5.4.176-1.el8.elrepo.x86_64.
  • Cluster details (nodes, monitors, OSDs): 5 nodes, 4 monitors, 33 OSDs
  • Did it happen on a stable environment or after a migration/upgrade?: this has been an issue for the entirety of this cluster's existence
  • Browser used (e.g.: Version 86.0.4240.198 (Official Build) (64-bit)): Browsers tested in: Windows Firefox, Windows MS Edge, and Brave browser. But also, lack of loading can be confirmed with curl from linux:
    At the request of redhat employee Ernesto Puerta, I also ran curl against the HTTPS port both to the actual IP of the mgr daemon, and to its local address of 127.0.0.1:8443. Both curl attempts just result in the CLI cursor blinking forever, with no output. When attempting to run curl against an unused port - 127.0.0.1:8444, "connection refused" immediately occurs (as expected).

How reproducible

Steps:

  1. Activate HTTPS mode for dashboard
  2. Don't have any user activity (HTTP GET) for 48+ hours
  3. Attempt to load the dashboard page

Actual results

When I’m in this hanging state, I check the cephadm logs with cephadm logs --name mgr.ceph01.fblojp -- -f but there’s nothing obvious (to my untrained eyes at least). When the dashboard is functional, I can see my own navigation around the dashboard in the logs so I know that logging is working:

Nov 01 15:46:32 ceph01.domain conmon5814: debug 2021-11-01T20:46:32.601+0000 7f7cbb42e700 0 [dashboard INFO request] [10.130.50.252:52267] [GET] [200] [0.013s] [admin] [1.0K] /api/summary

Expected results

Functioning dashboard

Additional info

I confirmed that after switching my dashboard's mode to HTTP only on 8080 for the past several months, this issue does NOT occur, so it must be something related to HTTPS
all nodes have 2x NICs that are bonded as of now. But this issue with the dashboard started before we performed the bonding.
Final note: as part of my testing, I replaced the built-in self-signed SSL certs, with ones generated from my windows PKI infrastructure. It made no difference whatsoever.

Actions

Also available in: Atom PDF