Project

General

Profile

Actions

Feature #59422

open

Support for proxy protocol v2 in RGW (beast) to preserve source IP and other details about original connection

Added by Christian Rohmann about 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

RGW is usually deployed in multiple instances and placed behind a load balancer, like HAProxy (https://docs.ceph.com/en/latest/cephadm/services/rgw/#high-availability-service-for-rgw)
And while it's certainly possible to have HAProxy (https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#option%20forwardfor)
or other software running as HTTP reverse proxies to send headers which Ceph RGW (using beast) then logs (via `rgw_log_http_headers`) this does not work for load balancers only doing TCP juggling or TLS-termination.

I'd like to ask you to look into supporting proxy protocol v2 which allows proxy servers / TCP load balancers to forward information about a client to the backend.

HAProxy supports and actually originates binary protocol specifying all sorts of fields to be piggybacked to the first TCP packet to the backend.
This is instead of using HTTP headers the likes of X-Forwarded-For, X-Forwarded-Proto, which obviously only work when speaking HTTP and have other drawbacks.

Apart from HAProxy there are many more load balancers providing source info via proxy protocol v2 on outgoing connections to their backends.
To not just name services from AWS, Azure, GCP which do support it for TCP connections via their load balancers or other networking services which terminate a TCP connection and would therefore "NAT" away source info, there are also quite a few (commercial) appliances

The proxy protocol v2 can also carry info like client certificates in case of TLS offloading and last but lot least is extensible via custom TLVs (type, length, value). Implementations of the proxy protocol v2 sometimes internally replace data like source ip and port with the values received via proxy protocol v2 and treat the connection as it was directly established this way. So logging, rate limits, ACLs and other uses of this data then transparently uses the actual client info and are not reduced to only seeing the load balancer as client.

Since RGW uses the beast library:

support for proxy protocol v2 likely needs to happen there first.
Thus I opened an issue there: https://github.com/boostorg/beast/issues/2484.

It would be nice if RGW was just that bis more versatile to be integrated with loadbalancers providing proxy protocol v2.

No data to display

Actions

Also available in: Atom PDF