Feature #24662: mgr/dashboard: SSL-enabled dashboard does not play nicely with a frontend HAproxy - Dashboard - Ceph

Actions

Copy link

Feature #24662

closed

Feature #47765: mgr/dashboard: security improvements

mgr/dashboard: SSL-enabled dashboard does not play nicely with a frontend HAproxy

Added by Florian Haas almost 6 years ago. Updated about 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Volker Theile

Category:

General

Target version:

Ceph - v15.0.0

% Done:

Source:

Tags:

Backport:

nautilus

Reviewed:

Affected Versions:

Ceph - v13.0.0, Ceph - v13.2.0, Ceph - v13.2.1, Ceph - v13.2.2, Ceph - v13.2.3, Ceph - v13.2.4, Ceph - v13.2.5, Ceph - v13.2.6, Ceph - v14.0.0, Ceph - v14.2.0, Ceph - v14.2.1, Ceph - v14.2.2, Ceph - v14.2.3, Ceph - v14.2.4

Pull request ID:

29088

Description

http://docs.ceph.com/docs/master/mgr/dashboard/#reverse-proxies talks about running the ceph-mgr dashboard behind a reverse proxy, so I am assuming that that is a deployment scenario that is at least meant to be supported. Unfortunately, I'm not quite sure how that would work, the way the dashboard is currently wired (in Mimic).

Consider the following scenario:

I have three mgr instances, `daisy`, `eric`, and `frank`.
The dashboard module is enabled and is configured to listen on port 8443.

I now have the following HAproxy configuration, which exposes the dashboard on the frontend's port 443.

frontend dashboard_front
  bind 0.0.0.0:80
  redirect scheme https code 301 if !{ ssl_fc }

frontend dashboard_front_ssl
  mode tcp
  bind 0.0.0.0:443
  default_backend dashboard_back_ssl

backend dashboard_back_ssl
  mode tcp
  balance source
  stick-table type ip size 200k expire 30m
  stick on src
  server daisy 192.168.122.114:8443 check
  server eric 192.168.122.115:8443 check
  server frank 192.168.122.116:8443 check

From the HAproxy side of things this is working perfectly fine. However, consider what happens if I issue a `curl` request against the HTTP frontend:

curl -k -IL http://[frontend HAproxy IP] 
HTTP/1.1 301 Moved Permanently
Content-length: 0
Location: https://[frontend HAproxy IP]/
Connection: close

HTTP/1.1 303 See Other
Date: Tue, 26 Jun 2018 12:40:19 GMT
Content-Length: 108
Content-Type: text/html;charset=utf-8
Location: https://daisy.example.com:8443/
Server: CherryPy/3.5.0

curl: (6) Could not resolve host: daisy.example.com

Since the redirect from CherryPy includes not only the name, but also the port of the backend server, it would seem to me that this means that a remote client can't possibly connect unless the internal hostname is resolvable via the DNS, and the frontend proxy is configured to listen on the same port as the backend host.

The url_prefix configuration option is not of much help here, because it is merely appended to the backend host's IP address (or hostname) and port. Would it perhaps make sense to introduce a url_alias option, allowing users to override what CherryPy sets for the redirect's Location attribute?

Or is there a better way to do this?

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by John Spray almost 6 years ago

Project changed from Ceph to mgr
Subject changed from ceph-dashboard does not play nicely with a frontend HAproxy to dashboard does not play nicely with a frontend HAproxy
Category set to 132

Actions

Copy link

Updated by Ricardo Dias almost 6 years ago

From the cherrypy docs in https://docs.cherrypy.org/en/latest/deploy.html#reverse-proxying we will need to provide a way to configure the ceph-dashboard in proxy mode and specify the frontend base URL, so that cherrypy can do his work.

Actions

Copy link

Updated by Lenz Grimmer about 5 years ago

Tracker changed from Bug to Feature

Actions

Copy link

Updated by Florian Haas about 5 years ago

Affected Versions v13.2.2, v13.2.3, v13.2.4, v13.2.5, v13.2.6, v14.2.0, v14.2.1 added

For anyone looking into this, it appears that the same issue is still present in Nautilus. I thought that ceph config set mgr mgr/dashboard/server_addr <proxy-ip> might do the trick, but apparently it does not.

Actions

Copy link

Updated by Florian Haas about 5 years ago

I'd like to add a bit of additional information to this.

One way of working around this issue is described here: https://blog.widodh.nl/2019/01/haproxy-in-front-of-ceph-manager-dashboard/

However as far as I can tell, this requires that

HAProxy operates with mode http (which, in turn, requires SSL to be disabled on the backend),
HAProxy is itself capable of resolving the Location received in the initial 303 response.

I can't think of a way this could be made to work if SSL terminates on the backend nodes, and HAProxy is configured with mode tcp.

Actions

Copy link

Updated by Volker Theile about 5 years ago

Can you please rephrase the requirements that are needed by the Dashboard to work correctly with HAProxy. Please don't assume that every developer knows about this piece of software and how it works.

Actions

Copy link

Updated by Lenz Grimmer about 5 years ago

Subject changed from dashboard does not play nicely with a frontend HAproxy to mgr/dashboard: dashboard does not play nicely with a frontend HAproxy
Target version set to v15.0.0
Tags set to security, ssl
Backport set to nautilus

Florian Haas wrote:

I can't think of a way this could be made to work if SSL terminates on the backend nodes, and HAProxy is configured with mode tcp.

Not being able to use SSL down to the Mgr in HA/Proxy environments can be considered a security concern that ought to be addressed.

So as far as I understand it, what's needed is a setting that allows overriding the redirect URL that is returned by standby dashboard modules with a custom one that would point to the Proxy's public IP address or hostname instead of the active Manager's address. Would this address this issue?

Actions

Copy link

Updated by Florian Haas about 5 years ago

Subject changed from mgr/dashboard: dashboard does not play nicely with a frontend HAproxy to mgr/dashboard: SSL-enabled dashboard does not play nicely with a frontend HAproxy

So here is the run-down of the issue as I see it. I'll add a little bit of information that may well be common knowledge to anyone reading this issue, but I'll reiterate anyway — so bear with me please. :) I'll also use HAProxy specific terms just to be unambiguous, but the issue described here would apply just the same to nginx or any other reverse proxy / HTTP load balancer.

First up, the dashboard creates a listening socket on all mgr hosts. However, the dashboard actually runs on only the active mgr. Any requests to a backup mgr get an HTTP 303, redirecting to the active mgr. Any requests going to the active mgr in the first place get a 200.

Now, when the dashboard runs behind HAproxy with mode http (as described in Wido's post), HAproxy checks for the HTTP status returned by the backend, and considers any backend "down" that does not return HTTP 200. That means that HAproxy will by itself make sure that a client request only ever hits the active mgr, and the client never sees the 303. This requires that SSL is disabled on the dashboard backend.

In order to support HTTPS in this configuration, a user needs to configure the HAproxy frontend to expose HTTPS. That, of course, means that the last hop (from HAproxy to the active mgr) is unencrypted, and this may not be acceptable based on the applicable security policy.

Now, of course one can use HAproxy as a load balancer in a setup where SSL goes all the way to the backend. In that case, however, HAproxy needs to operate with mode tcp, because it obviously has no way to inspect the encrypted HTTPS stream, and thus can't make its up/down decision based on the HTTP status code returned by the back end. Instead, all it can do is rely on whether it can complete a TCP handshake with the backend.

An SSL-enabled Ceph Dashboard does allow the successful establishment of a connection on its listening port, on all mgr instances. Thus, in such a configuration all mgrs (not only the active one) qualify as "up" backends in the HAProxy sense, so clients will be sent to backends where they get the HTTP 303. In that event, they will either get a Location that they cannot resolve, or, if it is resolvable, get caught in a redirect loop (because mode tcp redirection is usually done with a distribution algorithm based on source IP).

The summary of this is that currently, if one wants load-balancing through a service like HAproxy or similar, in combination with HTTPS, the only way to do that is to terminate HTTPS on the proxy and do the last hop unencrypted.

I'm not sure what the "right" way to resolve this would be, to be honest. Perhaps include a configuration option to toggle the dashboard behavior on non-active mgrs? Use an HTTP 303 by default, but also have the option to instead use a TCP RST, tearing down the connection altogether?

Actions

Copy link

Updated by Florian Haas about 5 years ago

I should add one more thing here: even with SSL enabled on the backend, it would be possible to configure HAproxy with mode tcp and a custom health check. That way, HAproxy would still use simple TCP forwarding (which it needs to, in order to not break end-to-end encryption), but not only rely on the successful establishment of the connection for the up/down decision. Instead it would base that decision on whatever the custom health check returns.

Edit: this approach is hardly practical because HAProxy typically runs in a chroot. For such a custom check (a combination of option external-check and external-check command) to work, it would have to be a statically linked binary in the HAProxy chroot.

But that's a fairly convoluted solution for something that most unsuspecting users would expect to be a trivial problem. Also, that solution would always be specific to HAproxy, and would have to be replicated for nginx, F5, or whatever the popular loadbalancer du jour might be.

Usual caveat, by the way: I may be missing something blindingly obvious that solves this issue much more easily. In that case, I'll be more than happy to stand corrected.

Actions

Copy link

#10

Updated by Florian Haas about 5 years ago

One more option I'm currently looking into: combining mode tcp with option httpcheck. That might be reasonably simple to do. If I can come up with a reference configuration, I'll hack up a doc patch.

Edit: this approach doesn't work. option httpcheck apparently always uses unencrypted HTTP, so you can't use that against something that speaks HTTPS. All that HAProxy gets is HTTP 400s (Bad Request), and it considers all backends down.

Edit edit: option httpcheck can be made to use HTTPS, if a backend server is configured with the ssl option. (see note 13, below)

Actions

Copy link

#11

Updated by Ricardo Dias about 5 years ago

Florian Haas wrote:

I'm not sure what the "right" way to resolve this would be, to be honest. Perhaps include a configuration option to toggle the dashboard behavior on non-active mgrs?

Maybe this is right solution. I don't see a very strong reason to have the non-active mgrs listening for HTTP connections since we have an easy way to know the current working dashboard URL by doing:

# ceph mgr services | jq .dashboard

Just to be clear, if we don't accept TCP connections to the dashboard webserver port in non-active mgrs that will make HAProxy checks correctly tagging which instances are up or down, and SSL can be enabled in dashboard. Is that correct Florian?

Actions

Copy link

#12

Updated by Florian Haas about 5 years ago

Just to be clear, if we don't accept TCP connections to the dashboard webserver port in non-active mgrs that will make HAProxy checks correctly tagging which instances are up or down, and SSL can be enabled in dashboard. Is that correct Florian?

By my understanding, yes. If the dashboard just doesn't accept connections on non-active mgrs, then that should work on reverse-proxy configurations without SSL, with SSL terminated on the backend, and with SSL terminated on the proxy.

Actions

Copy link

#13

Updated by Florian Haas about 5 years ago

~~With much kludging, it does seem that this can be done after all.~~

~~Here's a backend configuration that does seem to do the trick, i.e. not intercept SSL traffic but select the correct (SSL-only) backend based on its HTTP status:~~

backend dashboard_back
  mode tcp
  option httpchk GET /
  http-check expect status 200
  server daisy 192.168.122.114:8443 ssl check
  server eric 192.168.122.115:8443 ssl check
  server frank 192.168.122.116:8443 ssl check

~~HAProxy also needs to know about what CA signs your certs. If you're using a self-signed cert on the backends, that'll only work with ssl verify none check.~~

Still doesn't work without a frontend that has a cert of its own. So this still counts as an intercept and isn't true e2e.

Run-down from comment in note 8 stands. If you want to proxy, you have to terminate SSL on the proxy. If if the dashboard is meant to support proper SSL that is terminated on the backend, I can't think of a way to make that work without ditching the redirect, and do as Ricardo says in note 12.

Sorry for all the noise, been down a half-dozen dead ends here.

Actions

Copy link

#14

Updated by Lenz Grimmer about 5 years ago

Florian Haas wrote:

Just to be clear, if we don't accept TCP connections to the dashboard webserver port in non-active mgrs that will make HAProxy checks correctly tagging which instances are up or down, and SSL can be enabled in dashboard. Is that correct Florian?

By my understanding, yes. If the dashboard just doesn't accept connections on non-active mgrs, then that should work on reverse-proxy configurations without SSL, with SSL terminated on the backend, and with SSL terminated on the proxy.

So it seems we have three options on how to resolve this:

Make it possible to enable/disable the HTTP 303 redirection on standby ceph-mgr instances, by disabling the dashboard's standby mode, so a HTTP connection to the dashboard's TCP port on a standby manager instance results in a "Connection refused error".
Allow customizing the URL that is returned by the HTTP 303 redirect message on standby instances so that it points to the proxy's public host name or IP address instead of the URL of the active ceph-mgr instance that currently serves the dashboard
Write a custom health check for HAproxy as outlined in https://www.loadbalancer.org/blog/how-to-write-an-external-custom-healthcheck-for-haproxy/ that ?

Does that summarize the possible options correctly?
Any suggestions/preferences on how to address this shortcoming?
To me, 1 or 2 sounds like the most feasible options, but I wonder if we need to implement both or if one of them would be sufficient? Option 3 would be very specific to HAProxy.

Actions

Copy link

#15

Updated by Lenz Grimmer about 5 years ago

Translation missing: en.field_tag_list set to security, configuration
Tags deleted (~~security, ssl~~)

Actions

Copy link

#16

Updated by Florian Haas about 5 years ago

Lenz Grimmer wrote:

Does that summarize the possible options correctly?

From my point of view, yes it does.

Any suggestions/preferences on how to address this shortcoming?
To me, 1 or 2 sounds like the most feasible options, but I wonder if we need to implement both or if one of them would be sufficient? Option 3 would be very specific to HAProxy.

Agree with that assessment as well. I'd say that the option 1 is preferable. That gives users the option of terminating HTTPS either on the load balancer or on the mgr node, as applicable policy (or preference) dictates.

What do others think?

Actions

Copy link

#17

Updated by Sebastian Wagner almost 5 years ago

Relates to https://github.com/rook/rook/pull/3076#issuecomment-488012764

Actions

Copy link

#18

Updated by Sebastian Wagner almost 5 years ago

Does this also affect Prometheus or mgr/restful? If yes, we should update the ticket to reflect this to be a general mgr issue

Actions

Copy link

#19

Updated by Torben Hørup almost 5 years ago

Dashboard through proxy doesn't play nicely with grafana

When browser asks https://<proxyserver>/api/grafana/url the response is

{"instance": "http://grafana-host:3000/"}

which the client, in my case, doesn't know how to resolve or reach

Actions

Copy link

#20

Updated by Torben Hørup almost 5 years ago

One more thing, in relation to my previous comment:

when proxy is using TLS, the browser doesn't like that the dashboard tries to include non-tls http elements from grafana.

Actions

Copy link

#21

Updated by Lenz Grimmer almost 5 years ago

Torben Hørup wrote:

when proxy is using TLS, the browser doesn't like that the dashboard tries to include non-tls http elements from grafana.

Indeed, that's why we added the following note to the dashboard documentation

Ceph Dashboard embeds the Grafana dashboards via iframe HTML elements. If Grafana is configured without SSL/TLS support, most browsers will block the embedding of insecure content into a secured web page, if the SSL support in the dashboard has been enabled (which is the default configuration). If you can’t see the embedded Grafana dashboards after enabling them as outlined above, check your browser’s documentation on how to unblock mixed content. Alternatively, consider enabling SSL/TLS support in Grafana.

Actions

Copy link

#22

Updated by Volker Theile almost 5 years ago

Status changed from New to In Progress
Assignee set to Volker Theile

Actions

Copy link

#23

Updated by Volker Theile almost 5 years ago

I've added a HAProxy setup for our development environment, see https://github.com/ricardoasmarques/ceph-dev-docker/pull/75/files.

I was able to get it running, but still get redirected if a failover occurs between two HAProxy health checks. In this case HAProxy has not realized that the node marked as active is now down. The node which is now down sends the redirection to the frontend client. Because HAProxy acts in SSL pass through mode it can not modify the headers. So the only workaround for this will be to add an option in Ceph Dashboard to prevent the redirection to the active node.

Actions

Copy link

#24

Updated by Volker Theile almost 5 years ago

Torben Hørup wrote:

Dashboard through proxy doesn't play nicely with grafana

When browser asks https://<proxyserver>/api/grafana/url the response is

[...]

which the client, in my case, doesn't know how to resolve or reach

The Grafana Host must be reachable from outside (HAProxy frontend) to get working.

Actions

Copy link

#25