Bug #52826: Ingress haproxy/keepalived configuration issues with two rgw and two ingress daemons - Orchestrator - Ceph

Actions

Copy link

Bug #52826

open

Ingress haproxy/keepalived configuration issues with two rgw and two ingress daemons

Added by Manuel Holtgrewe over 2 years ago. Updated about 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

the setting¶

I have a v16.2.6 cluster running with six nodes osd-1..osd-6. I have two RGW realms "default" and "ext". I want to serve either through separate RGW daemons running on separate ports 8000 and 8100.

I deployed the rgws with ceph orch as follows:

# ceph orch apply rgw default default default-default-primary --port=8000 --placement="count-per-host:1;label:rgw" 
# ceph orch apply rgw ext ext ext-default-primary 8100 --placement="count-per-host:1;label:rgw"

I then used the following ingress YAML files to deploy ingress for the default realm to run be served by the virtual IP 172.16.62.26 and similarly 172.16.62.26 for the ext realm (as per the manual launched via ceph orch apply -i ingress.rgw.default.yaml and the same ingress.rgw.ext.yaml).

# cat ingress.rgw.default.yml
service_type: ingress
service_id: rgw.default
placement:
  count: 6
spec:
  backend_service: rgw.default
  virtual_ip: 172.16.62.26/19
  frontend_port: 443
  monitor_port: 1967
  ssl_cert: |
    -----BEGIN PRIVATE KEY-----
    # ...

# cat ingress.rgw.ext.yaml
service_type: ingress
service_id: rgw.ext
placement:
  count: 6
spec:
  backend_service: rgw.ext
  virtual_ip: 172.16.62.27/19
  frontend_port: 443
  monitor_port: 1968
  ssl_cert: |
    -----BEGIN PRIVATE KEY-----
    # ...

The output for rgw/ingress of ceph orch ls now is as follows.

# ceph orch ls
NAME                 PORTS                  RUNNING  REFRESHED  AGE  PLACEMENT
ingress.rgw.default  172.16.62.26:443,1967    12/12  4m ago     2d   count:6
ingress.rgw.ext      172.16.62.27:443,1968    12/12  4m ago     6m   count:6
rgw.default          ?:8000                     6/6  4m ago     2d   count-per-host:1;label:rgw
rgw.ext              ?:8100                     6/6  4m ago     19h  count-per-host:1;label:rgw

The full configuration of haproxy/keepalived on one host for the default service is given at the bottom.

the symptoms¶

1. When connecting to https://172.16.62.26 I randomly get connected to the rgw.default and rgw.ext services.
2. I get a lot of the following in keepalived logs.

# journalctl -f -u ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@keepalived.rgw.default.osd-1.vrjiew.service
-- Logs begin at Sun 2021-10-03 10:49:24 CEST. --
Oct 06 08:22:44 osd-1 bash[1335217]: Wed Oct  6 06:22:44 2021: (VI_0) received an invalid passwd!
Oct 06 08:22:45 osd-1 bash[1335217]: Wed Oct  6 06:22:45 2021: (VI_0) received an invalid passwd!
# ... the same line every second

workaround¶

For now, I will deploy the ingress for default on osd-1..osd-3 and ext on osd-4..osd-6 which circumvents the problem.

full configuration¶

==> /var/lib/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f/haproxy.rgw.default.osd-1.urpnuu/haproxy/haproxy.cfg <==
# This file is generated by cephadm.
global
    log         127.0.0.1 local2
    chroot      /var/lib/haproxy
    pidfile     /var/lib/haproxy/haproxy.pid
    maxconn     8000
    daemon
    stats socket /var/lib/haproxy/stats

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout queue           20s
    timeout connect         5s
    timeout http-request    1s
    timeout http-keep-alive 5s
    timeout client          1s
    timeout server          1s
    timeout check           5s
    maxconn                 8000

frontend stats
    mode http
    bind *:1967
    stats enable
    stats uri /stats
    stats refresh 10s
    stats auth admin:REDACTED
    http-request use-service prometheus-exporter if { path /metrics }
    monitor-uri /health

frontend frontend
    bind *:443 ssl crt /var/lib/haproxy/haproxy.pem
    default_backend backend

backend backend
    option forwardfor
    balance static-rr
    option httpchk HEAD / HTTP/1.0
    server rgw.default.osd-1.xqrjwp 172.16.62.10:8000 check weight 100
    server rgw.default.osd-2.lopjij 172.16.62.11:8000 check weight 100
    server rgw.default.osd-3.plbqka 172.16.62.12:8000 check weight 100
    server rgw.default.osd-4.jvkhen 172.16.62.13:8000 check weight 100
    server rgw.default.osd-5.hjxnrb 172.16.62.30:8000 check weight 100
    server rgw.default.osd-6.bdrxdd 172.16.62.31:8000 check weight 100

==> /var/lib/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f/keepalived.rgw.default.osd-1.vrjiew/keepalived.conf <==
# This file is generated by cephadm.
vrrp_script check_backend {
    script "/usr/bin/curl http://localhost:1967/health" 
    weight -20
    interval 2
    rise 2
    fall 2
}

vrrp_instance VI_0 {
  state MASTER
  priority 100
  interface bond0
  virtual_router_id 51
  advert_int 1
  authentication {
      auth_type PASS
      auth_pass REDACTED
  }
  unicast_src_ip 172.16.62.10
  unicast_peer {
    172.16.62.11
    172.16.62.12
    172.16.62.13
    172.16.62.30
    172.16.62.31
  }
  virtual_ipaddress {
    172.16.62.26/19 dev bond0
  }
  track_script {
      check_backend
  }

Actions

Copy link

Updated by Sebastian Wagner about 2 years ago

Project changed from 18 to Orchestrator

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Orchestrator

Custom queries

Bug #52826

Ingress haproxy/keepalived configuration issues with two rgw and two ingress daemons

the setting¶

the symptoms¶

workaround¶

full configuration¶

Updated by Sebastian Wagner about 2 years ago