Fix #58183: cephadm/ingress: s3cmd multi-part upload failing - Orchestrator - Ceph

Actions

Copy link

Fix #58183

closed

cephadm/ingress: s3cmd multi-part upload failing

Added by Frank Ederveen over 1 year ago. Updated about 1 year ago.

Status:

Resolved

Priority:

Normal

Assignee:

Frank Ederveen

Category:

orchestrator

Target version:

% Done:

Source:

Tags:

backport_processed

Backport:

quincy, pacific

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

49556

Crash signature (v1):

Crash signature (v2):

Description

When using the standard haproxy.cfg.j2 template, multi-part uploads with s3cmd fail.

# small-ish file, single upload
[frank.ederveen ~]$ dd if=/dev/urandom of=foo bs=1M count=15
15+0 records in
15+0 records out
15728640 bytes (16 MB, 15 MiB) copied, 0.0913662 s, 172 MB/s

[frank.ederveen ~]$ s3cmd put foo s3:///frank1/
upload: 'foo' -> 's3://frank1/foo' (15728640 bytes in 0.3 seconds, 49.07 MB/s) [1 of 1]

# larger upload, multi-part
[frank.ederveen ~]$ dd if=/dev/urandom of=foo bs=1M count=17
17+0 records in
17+0 records out
17825792 bytes (18 MB, 17 MiB) copied, 0.101309 s, 176 MB/s

[frank.ederveen ~]$ s3cmd put foo s3:///frank1/
ERROR: Cannot retrieve any response status before encountering an EPIPE or ECONNRESET exception
WARNING: Upload failed: /foo?partNumber=1&uploadId=2~__a4bpkYzJK7UKPRpUomoVXqUMmnWyd ([Errno 32] Broken pipe)
WARNING: Waiting 3 sec...
WARNING: Could not refresh role
ERROR: Cannot retrieve any response status before encountering an EPIPE or ECONNRESET exception
WARNING: Upload failed: /foo?partNumber=2&uploadId=2~__a4bpkYzJK7UKPRpUomoVXqUMmnWyd ([Errno 32] Broken pipe)
WARNING: Waiting 3 sec...
^CSee ya!

This seems to be caused by very short `timeout client` and `timeout server` settings. When changed from `1s` to `5s` to problem goes away.

On a related note, why was 'http-server-close' chosen over the default 'http-keep-alive'? Will do some testing with that and maybe file another ticket for that.

# /src/pybind/mgr/cephadm/templates/services/ingress/haproxy.cfg.j2
defaults
    mode                    {{ mode }}
    log                     global
{% if mode == 'http' %}
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout queue           20s
    timeout connect         5s
    timeout http-request    1s
    timeout http-keep-alive 5s
    timeout client          1s
    timeout server          1s
    timeout check           5s
{% endif %}

https://www.haproxy.org/download/2.8/doc/configuration.txt:

timeout client <timeout>
  Set the maximum inactivity time on the client side.
  May be used in sections :   defaults | frontend | listen | backend
                                 yes   |    yes   |   yes  |   no
  Arguments :
    <timeout> is the timeout value specified in milliseconds by default, but
              can be in any other unit if the number is suffixed by the unit,
              as explained at the top of this document.

  The inactivity timeout applies when the client is expected to acknowledge or
  send data. In HTTP mode, this timeout is particularly important to consider
  during the first phase, when the client sends the request, and during the
  response while it is reading data sent by the server. That said, for the
  first phase, it is preferable to set the "timeout http-request" to better
  protect HAProxy from Slowloris like attacks. The value is specified in
  milliseconds by default, but can be in any other unit if the number is
  suffixed by the unit, as specified at the top of this document. In TCP mode
  (and to a lesser extent, in HTTP mode), it is highly recommended that the
  client timeout remains equal to the server timeout in order to avoid complex
  situations to debug. It is a good practice to cover one or several TCP packet
  losses by specifying timeouts that are slightly above multiples of 3 seconds
  (e.g. 4 or 5 seconds).
[zap!]
timeout server <timeout>
  Set the maximum inactivity time on the server side.
  May be used in sections :   defaults | frontend | listen | backend
                                 yes   |    no    |   yes  |   yes
  Arguments :
    <timeout> is the timeout value specified in milliseconds by default, but
              can be in any other unit if the number is suffixed by the unit,
              as explained at the top of this document.

  The inactivity timeout applies when the server is expected to acknowledge or
  send data. In HTTP mode, this timeout is particularly important to consider
  during the first phase of the server's response, when it has to send the
  headers, as it directly represents the server's processing time for the
  request. To find out what value to put there, it's often good to start with
  what would be considered as unacceptable response times, then check the logs
  to observe the response time distribution, and adjust the value accordingly.

  The value is specified in milliseconds by default, but can be in any other
  unit if the number is suffixed by the unit, as specified at the top of this
  document. In TCP mode (and to a lesser extent, in HTTP mode), it is highly
  recommended that the client timeout remains equal to the server timeout in
  order to avoid complex situations to debug. Whatever the expected server
  response times, it is a good practice to cover at least one or several TCP
  packet losses by specifying timeouts that are slightly above multiples of 3
  seconds (e.g. 4 or 5 seconds minimum).

Related issues 2 (0 open — 2 closed)