Bug #55837: mgr/dashboard: After several days of not being used, Dashboard HTTPS website hangs during loading, with no errors - Dashboard - Ceph

Actions

Copy link

Bug #55837

open

mgr/dashboard: After several days of not being used, Dashboard HTTPS website hangs during loading, with no errors

Added by Zach Heise almost 2 years ago. Updated over 1 year ago.

Status:

Need More Info

Priority:

Normal

Assignee:

Ernesto Puerta

Category:

General - Back-end

Target version:

Ceph - v18.0.0

% Done:

Source:

Community (user)

Tags:

Backport:

pacific quincy

Regression:

Severity:

4 - irritation

Reviewed:

Affected Versions:

Ceph - v16.2.4, Ceph - v16.2.5, Ceph - v16.2.6, Ceph - v16.2.7, Ceph - v16.2.8, Ceph - v17.0.0, Ceph - v18.0.0

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Description of problem¶

I have noticed that every 2-3 days, in the morning when I start work, our ceph dashboard page does not respond in the browser. It works fine throughout the day, but it seems like after a certain unknown hours without anyone accessing it. Something must be going wrong with the dashboard module, or mgr daemon, because when I try to load (or refresh when it's already loaded) the ceph dashboard site, the browser just does the “throbber": spinning/loading on the page's favicon – no content on the page ever appears, no errors or anything. None of the buttons on the page load – nor time out and show a 404 – for example, Block\Images or Cluster\Hosts in the left sidebar will load, but show empty. And the throbber never stops.

I can easily fix it with ceph mgr module disable dashboard and then waiting 10 seconds, then ceph mgr module enable dashboard – this makes it start working again, until the next time I go a few days without using the dashboard, at which point I need to do the same process again. So this is an 'irritation' level bug.

Environment¶

ceph version string: 16.2.4 through 16.2.7
Platform (OS/distro/release): Linux/Alma/8.5 Kernel is 5.4.176-1.el8.elrepo.x86_64.
Cluster details (nodes, monitors, OSDs): 5 nodes, 4 monitors, 33 OSDs
Did it happen on a stable environment or after a migration/upgrade?: this has been an issue for the entirety of this cluster's existence
Browser used (e.g.: Version 86.0.4240.198 (Official Build) (64-bit)): Browsers tested in: Windows Firefox, Windows MS Edge, and Brave browser. But also, lack of loading can be confirmed with curl from linux:
At the request of redhat employee Ernesto Puerta, I also ran curl against the HTTPS port both to the actual IP of the mgr daemon, and to its local address of 127.0.0.1:8443. Both curl attempts just result in the CLI cursor blinking forever, with no output. When attempting to run curl against an unused port - 127.0.0.1:8444, "connection refused" immediately occurs (as expected).

How reproducible¶

Steps:

Activate HTTPS mode for dashboard
Don't have any user activity (HTTP GET) for 48+ hours
Attempt to load the dashboard page

Actual results¶

When I’m in this hanging state, I check the cephadm logs with cephadm logs --name mgr.ceph01.fblojp -- -f but there’s nothing obvious (to my untrained eyes at least). When the dashboard is functional, I can see my own navigation around the dashboard in the logs so I know that logging is working:

Nov 01 15:46:32 ceph01.domain conmon⁵⁸¹⁴: debug 2021-11-01T20:46:32.601+0000 7f7cbb42e700 0 [dashboard INFO request] [10.130.50.252:52267] [GET] [200] [0.013s] [admin] [1.0K] /api/summary

Expected results¶

Functioning dashboard

Additional info¶

I confirmed that after switching my dashboard's mode to HTTP only on 8080 for the past several months, this issue does NOT occur, so it must be something related to HTTPS
all nodes have 2x NICs that are bonded as of now. But this issue with the dashboard started before we performed the bonding.
Final note: as part of my testing, I replaced the built-in self-signed SSL certs, with ones generated from my windows PKI infrastructure. It made no difference whatsoever.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr » Dashboard

Custom queries

Bug #55837

mgr/dashboard: After several days of not being used, Dashboard HTTPS website hangs during loading, with no errors

Description of problem¶

Environment¶

How reproducible¶

Actual results¶

Expected results¶

Additional info¶

Updated by Ernesto Puerta almost 2 years ago

Updated by Ernesto Puerta almost 2 years ago

Updated by Zach Heise almost 2 years ago

Updated by Ernesto Puerta almost 2 years ago

Updated by Zach Heise almost 2 years ago

Updated by Zach Heise almost 2 years ago

Updated by Zach Heise over 1 year ago

Updated by Ernesto Puerta over 1 year ago

Updated by Zach Heise over 1 year ago

Updated by Ernesto Puerta over 1 year ago

Updated by Zach Heise over 1 year ago