Project

General

Profile

Feature #24662

mgr/dashboard: SSL-enabled dashboard does not play nicely with a frontend HAproxy

Added by Florian Haas about 1 year ago. Updated 3 days ago.

Status:
Need Review
Priority:
Normal
Assignee:
Category:
dashboard/general
Target version:
Start date:
06/26/2018
Due date:
% Done:

0%


Description

http://docs.ceph.com/docs/master/mgr/dashboard/#reverse-proxies talks about running the ceph-mgr dashboard behind a reverse proxy, so I am assuming that that is a deployment scenario that is at least meant to be supported. Unfortunately, I'm not quite sure how that would work, the way the dashboard is currently wired (in Mimic).

Consider the following scenario:

  • I have three mgr instances, `daisy`, `eric`, and `frank`.
  • The dashboard module is enabled and is configured to listen on port 8443.

I now have the following HAproxy configuration, which exposes the dashboard on the frontend's port 443.

frontend dashboard_front
  bind 0.0.0.0:80
  redirect scheme https code 301 if !{ ssl_fc }

frontend dashboard_front_ssl
  mode tcp
  bind 0.0.0.0:443
  default_backend dashboard_back_ssl

backend dashboard_back_ssl
  mode tcp
  balance source
  stick-table type ip size 200k expire 30m
  stick on src
  server daisy 192.168.122.114:8443 check
  server eric 192.168.122.115:8443 check
  server frank 192.168.122.116:8443 check

From the HAproxy side of things this is working perfectly fine. However, consider what happens if I issue a `curl` request against the HTTP frontend:

curl -k -IL http://[frontend HAproxy IP] 
HTTP/1.1 301 Moved Permanently
Content-length: 0
Location: https://[frontend HAproxy IP]/
Connection: close

HTTP/1.1 303 See Other
Date: Tue, 26 Jun 2018 12:40:19 GMT
Content-Length: 108
Content-Type: text/html;charset=utf-8
Location: https://daisy.example.com:8443/
Server: CherryPy/3.5.0

curl: (6) Could not resolve host: daisy.example.com

Since the redirect from CherryPy includes not only the name, but also the port of the backend server, it would seem to me that this means that a remote client can't possibly connect unless the internal hostname is resolvable via the DNS, and the frontend proxy is configured to listen on the same port as the backend host.

The url_prefix configuration option is not of much help here, because it is merely appended to the backend host's IP address (or hostname) and port. Would it perhaps make sense to introduce a url_alias option, allowing users to override what CherryPy sets for the redirect's Location attribute?

Or is there a better way to do this?

History

#1 Updated by John Spray about 1 year ago

  • Project changed from Ceph to mgr
  • Subject changed from ceph-dashboard does not play nicely with a frontend HAproxy to dashboard does not play nicely with a frontend HAproxy
  • Category set to dashboard/general

#2 Updated by Ricardo Dias about 1 year ago

From the cherrypy docs in https://docs.cherrypy.org/en/latest/deploy.html#reverse-proxying we will need to provide a way to configure the ceph-dashboard in proxy mode and specify the frontend base URL, so that cherrypy can do his work.

#3 Updated by Lenz Grimmer 4 months ago

  • Tracker changed from Bug to Feature

#4 Updated by Florian Haas 4 months ago

  • Affected Versions v13.2.2, v13.2.3, v13.2.4, v13.2.5, v13.2.6, v14.2.0, v14.2.1 added

For anyone looking into this, it appears that the same issue is still present in Nautilus. I thought that ceph config set mgr mgr/dashboard/server_addr <proxy-ip> might do the trick, but apparently it does not.

#5 Updated by Florian Haas 4 months ago

I'd like to add a bit of additional information to this.

One way of working around this issue is described here: https://blog.widodh.nl/2019/01/haproxy-in-front-of-ceph-manager-dashboard/

However as far as I can tell, this requires that

  • HAProxy operates with mode http (which, in turn, requires SSL to be disabled on the backend),
  • HAProxy is itself capable of resolving the Location received in the initial 303 response.

I can't think of a way this could be made to work if SSL terminates on the backend nodes, and HAProxy is configured with mode tcp.

#6 Updated by Volker Theile 4 months ago

Can you please rephrase the requirements that are needed by the Dashboard to work correctly with HAProxy. Please don't assume that every developer knows about this piece of software and how it works.

#7 Updated by Lenz Grimmer 4 months ago

  • Subject changed from dashboard does not play nicely with a frontend HAproxy to mgr/dashboard: dashboard does not play nicely with a frontend HAproxy
  • Target version set to v15.0.0
  • Tags set to security, ssl
  • Backport set to nautilus

Florian Haas wrote:

I can't think of a way this could be made to work if SSL terminates on the backend nodes, and HAProxy is configured with mode tcp.

Not being able to use SSL down to the Mgr in HA/Proxy environments can be considered a security concern that ought to be addressed.

So as far as I understand it, what's needed is a setting that allows overriding the redirect URL that is returned by standby dashboard modules with a custom one that would point to the Proxy's public IP address or hostname instead of the active Manager's address. Would this address this issue?

#8 Updated by Florian Haas 4 months ago

  • Subject changed from mgr/dashboard: dashboard does not play nicely with a frontend HAproxy to mgr/dashboard: SSL-enabled dashboard does not play nicely with a frontend HAproxy

So here is the run-down of the issue as I see it. I'll add a little bit of information that may well be common knowledge to anyone reading this issue, but I'll reiterate anyway — so bear with me please. :) I'll also use HAProxy specific terms just to be unambiguous, but the issue described here would apply just the same to nginx or any other reverse proxy / HTTP load balancer.

First up, the dashboard creates a listening socket on all mgr hosts. However, the dashboard actually runs on only the active mgr. Any requests to a backup mgr get an HTTP 303, redirecting to the active mgr. Any requests going to the active mgr in the first place get a 200.

Now, when the dashboard runs behind HAproxy with mode http (as described in Wido's post), HAproxy checks for the HTTP status returned by the backend, and considers any backend "down" that does not return HTTP 200. That means that HAproxy will by itself make sure that a client request only ever hits the active mgr, and the client never sees the 303. This requires that SSL is disabled on the dashboard backend.

In order to support HTTPS in this configuration, a user needs to configure the HAproxy frontend to expose HTTPS. That, of course, means that the last hop (from HAproxy to the active mgr) is unencrypted, and this may not be acceptable based on the applicable security policy.

Now, of course one can use HAproxy as a load balancer in a setup where SSL goes all the way to the backend. In that case, however, HAproxy needs to operate with mode tcp, because it obviously has no way to inspect the encrypted HTTPS stream, and thus can't make its up/down decision based on the HTTP status code returned by the back end. Instead, all it can do is rely on whether it can complete a TCP handshake with the backend.

An SSL-enabled Ceph Dashboard does allow the successful establishment of a connection on its listening port, on all mgr instances. Thus, in such a configuration all mgrs (not only the active one) qualify as "up" backends in the HAProxy sense, so clients will be sent to backends where they get the HTTP 303. In that event, they will either get a Location that they cannot resolve, or, if it is resolvable, get caught in a redirect loop (because mode tcp redirection is usually done with a distribution algorithm based on source IP).

The summary of this is that currently, if one wants load-balancing through a service like HAproxy or similar, in combination with HTTPS, the only way to do that is to terminate HTTPS on the proxy and do the last hop unencrypted.

I'm not sure what the "right" way to resolve this would be, to be honest. Perhaps include a configuration option to toggle the dashboard behavior on non-active mgrs? Use an HTTP 303 by default, but also have the option to instead use a TCP RST, tearing down the connection altogether?

#9 Updated by Florian Haas 4 months ago

I should add one more thing here: even with SSL enabled on the backend, it would be possible to configure HAproxy with mode tcp and a custom health check. That way, HAproxy would still use simple TCP forwarding (which it needs to, in order to not break end-to-end encryption), but not only rely on the successful establishment of the connection for the up/down decision. Instead it would base that decision on whatever the custom health check returns.

Edit: this approach is hardly practical because HAProxy typically runs in a chroot. For such a custom check (a combination of option external-check and external-check command) to work, it would have to be a statically linked binary in the HAProxy chroot.

But that's a fairly convoluted solution for something that most unsuspecting users would expect to be a trivial problem. Also, that solution would always be specific to HAproxy, and would have to be replicated for nginx, F5, or whatever the popular loadbalancer du jour might be.

Usual caveat, by the way: I may be missing something blindingly obvious that solves this issue much more easily. In that case, I'll be more than happy to stand corrected.

#10 Updated by Florian Haas 4 months ago

One more option I'm currently looking into: combining mode tcp with option httpcheck. That might be reasonably simple to do. If I can come up with a reference configuration, I'll hack up a doc patch.

Edit: this approach doesn't work. option httpcheck apparently always uses unencrypted HTTP, so you can't use that against something that speaks HTTPS. All that HAProxy gets is HTTP 400s (Bad Request), and it considers all backends down.

Edit edit: option httpcheck can be made to use HTTPS, if a backend server is configured with the ssl option. (see note 13, below)

#11 Updated by Ricardo Dias 4 months ago

Florian Haas wrote:

I'm not sure what the "right" way to resolve this would be, to be honest. Perhaps include a configuration option to toggle the dashboard behavior on non-active mgrs?

Maybe this is right solution. I don't see a very strong reason to have the non-active mgrs listening for HTTP connections since we have an easy way to know the current working dashboard URL by doing:

# ceph mgr services | jq .dashboard

Just to be clear, if we don't accept TCP connections to the dashboard webserver port in non-active mgrs that will make HAProxy checks correctly tagging which instances are up or down, and SSL can be enabled in dashboard. Is that correct Florian?

#12 Updated by Florian Haas 4 months ago

Just to be clear, if we don't accept TCP connections to the dashboard webserver port in non-active mgrs that will make HAProxy checks correctly tagging which instances are up or down, and SSL can be enabled in dashboard. Is that correct Florian?

By my understanding, yes. If the dashboard just doesn't accept connections on non-active mgrs, then that should work on reverse-proxy configurations without SSL, with SSL terminated on the backend, and with SSL terminated on the proxy.

#13 Updated by Florian Haas 4 months ago

With much kludging, it does seem that this can be done after all.

Here's a backend configuration that does seem to do the trick, i.e. not intercept SSL traffic but select the correct (SSL-only) backend based on its HTTP status:

backend dashboard_back
  mode tcp
  option httpchk GET /
  http-check expect status 200
  server daisy 192.168.122.114:8443 ssl check
  server eric 192.168.122.115:8443 ssl check
  server frank 192.168.122.116:8443 ssl check

HAProxy also needs to know about what CA signs your certs. If you're using a self-signed cert on the backends, that'll only work with ssl verify none check.

Still doesn't work without a frontend that has a cert of its own. So this still counts as an intercept and isn't true e2e.

Run-down from comment in note 8 stands. If you want to proxy, you have to terminate SSL on the proxy. If if the dashboard is meant to support proper SSL that is terminated on the backend, I can't think of a way to make that work without ditching the redirect, and do as Ricardo says in note 12.

Sorry for all the noise, been down a half-dozen dead ends here.

#14 Updated by Lenz Grimmer 3 months ago

Florian Haas wrote:

Just to be clear, if we don't accept TCP connections to the dashboard webserver port in non-active mgrs that will make HAProxy checks correctly tagging which instances are up or down, and SSL can be enabled in dashboard. Is that correct Florian?

By my understanding, yes. If the dashboard just doesn't accept connections on non-active mgrs, then that should work on reverse-proxy configurations without SSL, with SSL terminated on the backend, and with SSL terminated on the proxy.

So it seems we have three options on how to resolve this:

  1. Make it possible to enable/disable the HTTP 303 redirection on standby ceph-mgr instances, by disabling the dashboard's standby mode, so a HTTP connection to the dashboard's TCP port on a standby manager instance results in a "Connection refused error".
  2. Allow customizing the URL that is returned by the HTTP 303 redirect message on standby instances so that it points to the proxy's public host name or IP address instead of the URL of the active ceph-mgr instance that currently serves the dashboard
  3. Write a custom health check for HAproxy as outlined in https://www.loadbalancer.org/blog/how-to-write-an-external-custom-healthcheck-for-haproxy/ that ?

Does that summarize the possible options correctly?
Any suggestions/preferences on how to address this shortcoming?
To me, 1 or 2 sounds like the most feasible options, but I wonder if we need to implement both or if one of them would be sufficient? Option 3 would be very specific to HAProxy.

#15 Updated by Lenz Grimmer 3 months ago

  • Tags set to security, configuration
  • Tags deleted (security, ssl)

#16 Updated by Florian Haas 3 months ago

Lenz Grimmer wrote:

Does that summarize the possible options correctly?

From my point of view, yes it does.

Any suggestions/preferences on how to address this shortcoming?
To me, 1 or 2 sounds like the most feasible options, but I wonder if we need to implement both or if one of them would be sufficient? Option 3 would be very specific to HAProxy.

Agree with that assessment as well. I'd say that the option 1 is preferable. That gives users the option of terminating HTTPS either on the load balancer or on the mgr node, as applicable policy (or preference) dictates.

What do others think?

#18 Updated by Sebastian Wagner 3 months ago

Does this also affect Prometheus or mgr/restful? If yes, we should update the ticket to reflect this to be a general mgr issue

#19 Updated by Torben Hørup 20 days ago

Dashboard through proxy doesn't play nicely with grafana

When browser asks https://&lt;proxyserver&gt;/api/grafana/url the response is

{"instance": "http://grafana-host:3000/"}

which the client, in my case, doesn't know how to resolve or reach

#20 Updated by Torben Hørup 20 days ago

One more thing, in relation to my previous comment:

when proxy is using TLS, the browser doesn't like that the dashboard tries to include non-tls http elements from grafana.

#21 Updated by Lenz Grimmer 20 days ago

Torben Hørup wrote:

when proxy is using TLS, the browser doesn't like that the dashboard tries to include non-tls http elements from grafana.

Indeed, that's why we added the following note to the dashboard documentation

Ceph Dashboard embeds the Grafana dashboards via iframe HTML elements. If Grafana is configured without SSL/TLS support, most browsers will block the embedding of insecure content into a secured web page, if the SSL support in the dashboard has been enabled (which is the default configuration). If you can’t see the embedded Grafana dashboards after enabling them as outlined above, check your browser’s documentation on how to unblock mixed content. Alternatively, consider enabling SSL/TLS support in Grafana.

#22 Updated by Volker Theile 12 days ago

  • Status changed from New to In Progress
  • Assignee set to Volker Theile

#23 Updated by Volker Theile 5 days ago

I've added a HAProxy setup for our development environment, see https://github.com/ricardoasmarques/ceph-dev-docker/pull/75/files.

I was able to get it running, but still get redirected if a failover occurs between two HAProxy health checks. In this case HAProxy has not realized that the node marked as active is now down. The node which is now down sends the redirection to the frontend client. Because HAProxy acts in SSL pass through mode it can not modify the headers. So the only workaround for this will be to add an option in Ceph Dashboard to prevent the redirection to the active node.

#24 Updated by Volker Theile 5 days ago

Torben Hørup wrote:

Dashboard through proxy doesn't play nicely with grafana

When browser asks https://&lt;proxyserver&gt;/api/grafana/url the response is

[...]

which the client, in my case, doesn't know how to resolve or reach

The Grafana Host must be reachable from outside (HAProxy frontend) to get working.

#25 Updated by Volker Theile 3 days ago

  • Status changed from In Progress to Need Review
  • Pull request ID set to 29088

Also available in: Atom PDF