Project

General

Profile

Actions

Bug #48973

closed

mgr/dashboard: dashboard hangs when accessing it

Added by Ernesto Puerta over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Immediate
Category:
General
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Description of problem

On first try to access dashboard (typing https://<dashboard_url> and pressing ENTER) the browser (tested in Chrome Version 87.0.4280.88 (Official Build) (64-bit) and Firefox 84.0.1 (64-bit)) doesn't show any page and it simply keeps waiting.

Environment

  • ceph version string:
    • Initially identified in master Jan 24th (97480142a69e7ff5bd2abaceb42cffe4b749d00c)
    • Reproduced in Pacific (16.1.0)
    • Also 1 month ago, Dec 24th (6793756f45f669240de952edd92946541385d090). This discards latest changes to ceph-mgr C++ code related to GIL/locks.
    • Dec 16 (63a5cd41c8b4e1ff5ee01854b4aa1425fe2da1bf). This discards CVE changes, including JWT and account lock-out.
  • Platform (OS/distro/release):
    • CentOS 8.3 / Fedora 32
    • python3-cherrypy-18.4.0-1.el8.noarch
    • NOT REPRODUCED in OpenSUSE Tumbleweed (Cherrypy 18.6.0-2.1 - Cheroot 8.3.0)
  • Cluster details (nodes, monitors, OSDs): minimal vstart cluster with 1 mon + 1 mgr + 3 OSDs. It happens as well in Cephadm deployments.
  • Browser used (e.g.: Version 86.0.4240.198 (Official Build) (64-bit)):
    • Chrome Version 87.0.4280.88 (Official Build) (64-bit)
    • Firefox 84.0.1 (64-bit)
  • Other:
    • NOT REPRODUCED with plain HTTP (HTTPS is required) It happens too, so this seems to relate the elapsed time for establishing the connection with the likelihood the issue to pop up (HTTP < static assets over HTTPS < HTTPS + Auth).

How reproducible

From a freshly launched dashboard (or an immediately restarted mgr), wait until the initialization finishes (curl -kv https://<dashboard_url> returns the index.html). Then switch to a browser (either Chrome or Firefox) and type the dashboard URL in the navigation bar and press ENTER. That's enough to trigger the issue.

Sporadic requests via `curl` don't trigger the issue. It happens when multiple requests are issued at the same time. It can be reproduced from the CLI with Apache benchmark:

> ab -c20 -n1000 "https://<dashboard_url>/docs" 

Benchmarking <dashboard_url> (be patient)
Completed 1000 requests
Completed 2000 requests
SSL handshake failed (5).
Completed 3000 requests
SSL handshake failed (5).
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
...

Complete requests:      10000
Failed requests:        2
   (Connect: 0, Receive: 0, Length: 2, Exceptions: 0)
Total transferred:      13387322 bytes

Actual results

Dashboard login page is not displaying and the browser keeps loading/waiting until manually stopped (minutes). After that, the curl requests no longer work:

curl -kv https://localhost:11000
* Rebuilt URL to: https://localhost:11000/
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 11000 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: O=IT; CN=ceph-dashboard
*  start date: Jan 20 16:42:29 2021 GMT
*  expire date: Jan 18 16:42:29 2031 GMT
*  issuer: O=IT; CN=ceph-dashboard
*  SSL certificate verify result: self signed certificate (18), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET / HTTP/1.1
> Host: localhost:11000
> User-Agent: curl/7.61.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):

Expected results

Dashboard login page loads normally.

Additional info

Current efforts are led towards finding where this issue comes from:
  • Dashboard Python code:
  • Ceph-mgr Python
  • Ceph-mgr C++
  • CherryPy: reproduced with both builtin and PyOpenSSL transport wrappers.
Actions

Also available in: Atom PDF