Documentation #52825: haproxy causes high number of connection resets - Orchestrator - Ceph

Actions

Copy link

Documentation #52825

closed

haproxy causes high number of connection resets

Added by Manuel Holtgrewe over 2 years ago. Updated almost 2 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Tags:

Backport:

Reviewed:

Affected Versions:

Pull request ID:

Description

I have discovered an interesting haproxy behaviour that causes a high number of connection resets on rgw hosts when HAProxy is used for ingress. I've posted earlier to the mailing list (and will again reference here) and now have found out that this behaviour is normal. HAProxy generates a lot of TCP connection resets which causes RGW/beast log messages (at least on debug level). After some research, this turns out to be static noise that confused me when I tried to debug problems with monitors going on and one with leader election.

I want to leave this here to collect my findings and make it easier for future users to find this information.

summary¶

If you deploy RGW with haproxy/keepalived such as cephadm does for ingress then you will see

- a high number of TCP connection resets and
- RGW/beast logging ERROR: client_io->complete_request() returned Connection reset by peer.

Don't panic, it turns out that this behaviour is expected.

My suggestion is to switch off ingress temporarily until you have finished debugging.

my setting¶

I have a v16.2.6 cluster with 6 nodes (osd-1..osd-6) running with cehphadm. I migrated from a nautilus cluster originally installed with ceph-ansible that I then migrated to v15 with octopus and then v16.2.6 with cephadm.

For this issue, the following daemons are relevant.

# ceph orch ls
NAME                 PORTS                  RUNNING  REFRESHED  AGE  PLACEMENT
ingress.rgw.default  172.16.62.26:443,1967    12/12  96s ago    2d   count:6
rgw.default          ?:8000                     6/6  96s ago    2d   count-per-host:1;label:rgw

This generates haproxy/keepalived configuration shown at the bottom.

Now to the part that confused me. I see a high number of connection resets, apparent in netstat output.

# netstat -s | grep -A 10 ^Tcp:
Tcp:
    2726076 active connections openings
    1713993 passive connection openings
    124711 failed connection attempts
    1648262 connection resets received      # <-- this is too high
    14086 connections established
    4238192699 segments received
    24207750548 segments send out
    32408995 segments retransmited
    0 bad segments received.
    1015230 resets sent

I see the following in the rgw logs (this repeats over and over every second or so).

# journalctl -f -u ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@rgw.default.osd-1.xqrjwp.service
-- Logs begin at Sun 2021-10-03 10:49:24 CEST. --
Oct 06 07:54:43 osd-1 bash[90888]: debug 2021-10-06T05:54:43.875+0000 7fd701142700  1 ====== starting new request req=0x7fd8306b6620 =====
Oct 06 07:54:43 osd-1 bash[90888]: debug 2021-10-06T05:54:43.876+0000 7fd7e4308700  1 ====== req done req=0x7fd8306b6620 op status=0 http_status=200 latency=0.001000019s ======
Oct 06 07:54:43 osd-1 bash[90888]: debug 2021-10-06T05:54:43.876+0000 7fd7e4308700  1 beast: 0x7fd8306b6620: 172.16.62.12 - anonymous [06/Oct/2021:05:54:43.875 +0000] "HEAD / HTTP/1.0" 200 5 - - - latency=0.001000019s
Oct 06 07:54:44 osd-1 bash[90888]: debug 2021-10-06T05:54:44.427+0000 7fd703146700  1 ====== starting new request req=0x7fd8306b6620 =====
Oct 06 07:54:44 osd-1 bash[90888]: debug 2021-10-06T05:54:44.427+0000 7fd760a01700  1 ====== req done req=0x7fd8306b6620 op status=0 http_status=200 latency=0.000000000s ======

It looks to me that the following fragment of haproxy configuration is responsible for acting like this.

backend backend
    option httpchk HEAD / HTTP/1.0  # <-- here

I did a lot of digging and finally ended with this thread:

these sequences of packets including a RST are completely normal. This is the way that haproxy uses to do health checks efficiently. As soon as haproxy has discovered that the endpoint is up, there is no point in wasting any further resources at either end. It turns out that using TCP RST is the most efficient way for kernels at both ends of the connection to finish their conversation and free up those resources.

[...]

Again, the RST are completely normal. Don’t panic :wink:

- https://discourse.haproxy.org/t/connection-reset-seen-every-2-sec-haproxy/2156/5

This is how I came up with my summary of "keep calm and keep on ignoring".

full configuration¶

==> /var/lib/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f/haproxy.rgw.default.osd-1.urpnuu/haproxy/haproxy.cfg <==
# This file is generated by cephadm.
global
    log         127.0.0.1 local2
    chroot      /var/lib/haproxy
    pidfile     /var/lib/haproxy/haproxy.pid
    maxconn     8000
    daemon
    stats socket /var/lib/haproxy/stats

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout queue           20s
    timeout connect         5s
    timeout http-request    1s
    timeout http-keep-alive 5s
    timeout client          1s
    timeout server          1s
    timeout check           5s
    maxconn                 8000

frontend stats
    mode http
    bind *:1967
    stats enable
    stats uri /stats
    stats refresh 10s
    stats auth admin:<REDACTED>
    http-request use-service prometheus-exporter if { path /metrics }
    monitor-uri /health

frontend frontend
    bind *:443 ssl crt /var/lib/haproxy/haproxy.pem
    default_backend backend

backend backend
    option forwardfor
    balance static-rr
    option httpchk HEAD / HTTP/1.0
    server rgw.default.osd-1.xqrjwp 172.16.62.10:8000 check weight 100
    server rgw.default.osd-2.lopjij 172.16.62.11:8000 check weight 100
    server rgw.default.osd-3.plbqka 172.16.62.12:8000 check weight 100
    server rgw.default.osd-4.jvkhen 172.16.62.13:8000 check weight 100
    server rgw.default.osd-5.hjxnrb 172.16.62.30:8000 check weight 100
    server rgw.default.osd-6.bdrxdd 172.16.62.31:8000 check weight 100

==> /var/lib/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f/keepalived.rgw.default.osd-1.vrjiew/keepalived.conf <==
# This file is generated by cephadm.
vrrp_script check_backend {
    script "/usr/bin/curl http://localhost:1967/health" 
    weight -20
    interval 2
    rise 2
    fall 2
}

vrrp_instance VI_0 {
  state MASTER
  priority 100
  interface bond0
  virtual_router_id 51
  advert_int 1
  authentication {
      auth_type PASS
      auth_pass <REDACTED>
  }
  unicast_src_ip 172.16.62.10
  unicast_peer {
    172.16.62.11
    172.16.62.12
    172.16.62.13
    172.16.62.30
    172.16.62.31
  }
  virtual_ipaddress {
    172.16.62.26/19 dev bond0
  }
  track_script {
      check_backend
  }