Bug #48973: mgr/dashboard: dashboard hangs when accessing it - Dashboard - Ceph

Actions

Copy link

Bug #48973

closed

mgr/dashboard: dashboard hangs when accessing it

Added by Ernesto Puerta about 3 years ago. Updated about 3 years ago.

Status:

Resolved

Priority:

Immediate

Assignee:

Ernesto Puerta

Category:

General

Target version:

Ceph - v17.0.0

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

Ceph - v16.1.0, Ceph - v17.0.0

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Description of problem¶

On first try to access dashboard (typing https://<dashboard_url> and pressing ENTER) the browser (tested in Chrome Version 87.0.4280.88 (Official Build) (64-bit) and Firefox 84.0.1 (64-bit)) doesn't show any page and it simply keeps waiting.

Environment¶

ceph version string:
- Initially identified in master Jan 24th (97480142a69e7ff5bd2abaceb42cffe4b749d00c)
- Reproduced in Pacific (16.1.0)
- Also 1 month ago, Dec 24th (6793756f45f669240de952edd92946541385d090). This discards latest changes to ceph-mgr C++ code related to GIL/locks.
- Dec 16 (63a5cd41c8b4e1ff5ee01854b4aa1425fe2da1bf). This discards CVE changes, including JWT and account lock-out.
Platform (OS/distro/release):
- CentOS 8.3 / Fedora 32
- python3-cherrypy-18.4.0-1.el8.noarch
- NOT REPRODUCED in OpenSUSE Tumbleweed (Cherrypy 18.6.0-2.1 - Cheroot 8.3.0)
Cluster details (nodes, monitors, OSDs): minimal vstart cluster with 1 mon + 1 mgr + 3 OSDs. It happens as well in Cephadm deployments.
Browser used (e.g.: Version 86.0.4240.198 (Official Build) (64-bit)):
- Chrome Version 87.0.4280.88 (Official Build) (64-bit)
- Firefox 84.0.1 (64-bit)
Other:
- ~~NOT REPRODUCED with plain HTTP (HTTPS is required)~~ It happens too, so this seems to relate the elapsed time for establishing the connection with the likelihood the issue to pop up (HTTP < static assets over HTTPS < HTTPS + Auth).

How reproducible¶

From a freshly launched dashboard (or an immediately restarted mgr), wait until the initialization finishes (curl -kv https://<dashboard_url> returns the index.html). Then switch to a browser (either Chrome or Firefox) and type the dashboard URL in the navigation bar and press ENTER. That's enough to trigger the issue.

Sporadic requests via `curl` don't trigger the issue. It happens when multiple requests are issued at the same time. It can be reproduced from the CLI with Apache benchmark:

> ab -c20 -n1000 "https://<dashboard_url>/docs" 

Benchmarking <dashboard_url> (be patient)
Completed 1000 requests
Completed 2000 requests
SSL handshake failed (5).
Completed 3000 requests
SSL handshake failed (5).
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
...

Complete requests:      10000
Failed requests:        2
   (Connect: 0, Receive: 0, Length: 2, Exceptions: 0)
Total transferred:      13387322 bytes

Actual results¶

Dashboard login page is not displaying and the browser keeps loading/waiting until manually stopped (minutes). After that, the curl requests no longer work:

curl -kv https://localhost:11000
* Rebuilt URL to: https://localhost:11000/
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 11000 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: O=IT; CN=ceph-dashboard
*  start date: Jan 20 16:42:29 2021 GMT
*  expire date: Jan 18 16:42:29 2031 GMT
*  issuer: O=IT; CN=ceph-dashboard
*  SSL certificate verify result: self signed certificate (18), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET / HTTP/1.1
> Host: localhost:11000
> User-Agent: curl/7.61.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):

Expected results¶

Dashboard login page loads normally.

Additional info¶

Current efforts are led towards finding where this issue comes from:

Dashboard Python code:
Ceph-mgr Python
Ceph-mgr C++
CherryPy: reproduced with both builtin and PyOpenSSL transport wrappers.

Actions

Copy link

Updated by Ernesto Puerta about 3 years ago

Description updated (diff)

Actions

Copy link

Updated by Ernesto Puerta about 3 years ago

Description updated (diff)

Actions

Copy link

Updated by Ernesto Puerta about 3 years ago

Description updated (diff)

Actions

Copy link

Updated by Ernesto Puerta about 3 years ago

Description updated (diff)

Actions

Copy link

Updated by Ernesto Puerta about 3 years ago

Description updated (diff)

Actions

Copy link

Updated by Ken Dreyer about 3 years ago

Ernesto, I updated CherryPy to 18.6.0 today in Rawhide and I published el8 RPMs at https://fedorapeople.org/~ktdreyer/bz1777494/ , if it helps for testing.

Actions

Copy link

Updated by Ernesto Puerta about 3 years ago

Status changed from Need More Info to In Progress

Thanks a lot, Ken!

In fact it seems that the issue doesn't come from CherryPy but from Cheroot, the Cherrypy webserver engine. We've seen EPEL 8 provides Cheroot v8.5.1 (a non stable version from this Dec 16th, while the last one labeled as stable is v8.4.5 from August).

I manually applied this patch to 8.5.1 and the issue vanished... So, how can we report EPEL maintainers to update this package to 8.5.2 or keep it to the latest stable (8.4.5, which I already tested and doesn't exhibit this issue)?

Actions

Copy link

Updated by Ernesto Puerta about 3 years ago

BZ opened to EPEL project: https://bugzilla.redhat.com/show_bug.cgi?id=1920461

Actions

Copy link

Updated by Ernesto Puerta about 3 years ago

Description updated (diff)

Actions

Copy link

#10

Updated by Ken Dreyer about 3 years ago

Yeah, we brought Cheroot v8.5.1 to EPEL 8 for #47875.

I built Cheroot v8.5.2 at https://fedorapeople.org/~ktdreyer/bz1920461/ , want to test it? It seems to fix this issue for me.

Actions

Copy link

#11

Updated by Ken Dreyer about 3 years ago

Justin pushed v8.5.2 to Bodhi at https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-848a87b9dc, so this will go to the epel-testing Yum repository in the next day or so when the Fedora admins push that to testing.

Actions

Copy link

#12

Updated by Alfonso Martínez about 3 years ago

Ken Dreyer wrote:

Yeah, we brought Cheroot v8.5.1 to EPEL 8 for #47875.

I built Cheroot v8.5.2 at https://fedorapeople.org/~ktdreyer/bz1920461/ , want to test it? It seems to fix this issue for me.

Hi Ken,

I tested v8.5.2 (adding that repo in https://github.com/rhcs-dashboard/ceph-dev/ centos8 container and upgrading the package): it fixes the problem.

Actions

Copy link

#13

Updated by Ken Dreyer about 3 years ago

Thank you for adding karma in Bodhi. This should go out to EPEL's stable repo this week.

Actions

Copy link

#14

Updated by Ernesto Puerta about 3 years ago

Status changed from In Progress to Resolved
Backport deleted (~~pacific~~)

No need to backport as this came from an external dependency and was fixed there (EPEL 8).

Actions

Copy link

#15

Updated by Ernesto Puerta about 3 years ago

Project changed from mgr to Dashboard
Category changed from 132 to General

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr » Dashboard

Custom queries

Bug #48973

mgr/dashboard: dashboard hangs when accessing it

Description of problem¶

Environment¶

How reproducible¶

Actual results¶

Expected results¶

Additional info¶

Updated by Ernesto Puerta about 3 years ago

Updated by Ernesto Puerta about 3 years ago

Updated by Ernesto Puerta about 3 years ago

Updated by Ernesto Puerta about 3 years ago

Updated by Ernesto Puerta about 3 years ago

Updated by Ken Dreyer about 3 years ago

Updated by Ernesto Puerta about 3 years ago

Updated by Ernesto Puerta about 3 years ago

Updated by Ernesto Puerta about 3 years ago

Updated by Ken Dreyer about 3 years ago

Updated by Ken Dreyer about 3 years ago

Updated by Alfonso Martínez about 3 years ago

Updated by Ken Dreyer about 3 years ago

Updated by Ernesto Puerta about 3 years ago

Updated by Ernesto Puerta about 3 years ago