Project

General

Profile

Feature #10551

test RGW with mod_proxy_fcgi instead of mod_fastcgi

Added by Ken Dreyer almost 6 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:

Description

The switch from mod_fastcgi to mod_proxy_fcgi is described on ceph-devel, at http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/22552

As discussed in a QA meeting yesterday, this Redmine ticket is to track the Teuthology changes necessary to support the fastcgi -> proxy-fcgi change.

There are two types of occurrences of the string "fastcgi" in ceph-qa-suite.git:
1. rgw tasks (tasks/rgw.py)
2. An apache configuration template (tasks/apache.conf.template)

These will need to be updated for mod_proxy_fcgi instead.

History

#1 Updated by Ken Dreyer almost 6 years ago

Until mod_proxy_fcgi is available in EPEL-6 (https://bugzilla.redhat.com/1182770), there are EL6 packages here: http://copr.fedoraproject.org/coprs/ktdreyer/mod_proxy_fcgi/

These packages are going to match what we put into the Red Hat Ceph product.

#2 Updated by Zack Cerza almost 6 years ago

  • Target version set to sprint22

#3 Updated by Andrew Schoen over 5 years ago

  • Assignee set to Andrew Schoen

#4 Updated by Zack Cerza over 5 years ago

  • Target version changed from sprint22 to sprint23

#5 Updated by Andrew Schoen over 5 years ago

Another aspect of this is getting rid of our fork of apache and using the distro provided version. We've decided it'd be best by starting with the apache changes and then the moving to the mod_fastcgi stuff.

This is the PR for that apache changes (https://github.com/ceph/ceph-qa-chef/pull/8), once it's merged and proven to not cause issues we'll follow with the other needed changes.

#6 Updated by Andrew Schoen over 5 years ago

Will we need issue #10808 for this to be considered completely resolved? I imagine so.

#7 Updated by Andrew Schoen over 5 years ago

We'll need to add this config to the apache template in ceph-qa-suite to enable mod_proxy and mod_proxy_fcgi:

LoadModule proxy_module {mod_path}/mod_proxy.so
LoadModule proxy_fcgi_module {mod_path}/mod_proxy_fcgi.so

Also, remove anything related to mod_fastcgi.

#8 Updated by Andrew Schoen over 5 years ago

We're also setting a few things in the apache.conf for mod_fastcgi that we'll need mod_proxy_fcgi equivalents for.

 40 FastCgiIPCDir {testdir}/apache/tmp.{client}/fastcgi_sock
 41 FastCgiExternalServer {testdir}/apache/htdocs.{client}/rgw.fcgi -socket rgw_sock -idle-timeout {idle_timeout}
 42 RewriteEngine On
 43
 44 RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /rgw.fcgi?page=$1&params=$2&%{{QUERY_STRING}}[E=HTTP_AUTHORIZATION:%{{HTTP:Authorization}},L]
 45
 46 # Set fastcgi environment variables.
 47 # Note that this is separate from Unix environment variables!
 48 SetEnv RGW_LOG_LEVEL 20
 49 SetEnv RGW_SHOULD_LOG yes
 50 SetEnv RGW_PRINT_CONTINUE {print_continue}

Note that we'll probably need to use ProxyPassMatch now instead of RewriteEngine and RewriteRule. See, http://httpd.apache.org/docs/trunk/mod/mod_proxy_fcgi.html

#9 Updated by Ken Dreyer over 5 years ago

Here's the sample configs that Yehuda gave to John and me a while back.

Note that there's a difference here depending on whether Apache supports Unix Domain Sockets (UDS).

This version uses UDS. Since UDS support was merged in Apache 2.4.9, it needs to get backported to the enterprise distros. It's been backported to RHEL 7.0, but UDS support is not yet in Ubuntu 14.04.

<VirtualHost *:80>
    ServerName localhost
    DocumentRoot /var/www/html

    ErrorLog ${APACHE_LOG_DIR}/error.log
    CustomLog ${APACHE_LOG_DIR}/access.log combined

    LogLevel debug

    RewriteEngine On

    RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

    ProxyPass / unix:///tmp/.radosgw.sock|fcgi://localhost:9000/ disablereuse=On
</VirtualHost>

The following Apache configuration is for Apache 2.2 and early versions of Apache 2.4 that don't use Unix Domain Sockets, and therefore use localhost TCP.

Note that RHEL 6, Ubuntu 12.04 ("Precise"), SLES11, and Debian stable ("Wheezy", 7.7) all ship Apache 2.2, so they will never have Unix Domain Sockets support.

<VirtualHost *:80>
    ServerName localhost
    DocumentRoot /var/www/html

    ErrorLog ${APACHE_LOG_DIR}/error.log
    CustomLog ${APACHE_LOG_DIR}/access.log combined

    LogLevel debug

    RewriteEngine On

    RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

    SetEnv proxy-nokeepalive 1

    ProxyPass / fcgi://127.0.01:9000/
</VirtualHost>

#10 Updated by Zack Cerza over 5 years ago

Yehuda is telling me that we will in fact need to support both modules. Here's the breakdown as I understand it now:

  • All distros with apache 2.4 or greater: mod_proxy_fcgi
  • RHEL distros and variants: mod_proxy_fcgi
  • Fedora distros with apache 2.2 or earlier: TBD
  • All other non-RHEL distros with apache 2.2 or earlier: mod_fastcgi

#11 Updated by Andrew Schoen over 5 years ago

Ken, thanks. Those configs are helpful. FastCgiIPCDir and FastCgiExternalServer are specific to mod_fastcgi so I assume those will be removed. What about the fastcgi environment variables? The fact that they're prefixed with RGW makes me think they probably still need to be there, but I'm wondering if we can still set them the same way.

#12 Updated by Andrew Schoen over 5 years ago

  • Status changed from New to Need More Info

#13 Updated by Zack Cerza over 5 years ago

  • Assignee changed from Andrew Schoen to Ken Dreyer

We really need to know if we're going to continue supporting both modules. If we're going to continue to support mod_fastcgi in any way, it's my opinion that we should continue testing it in supported configurations. If we're not, I just want it in writing so that we're not in a bad place if we remove support for it in our tools.

#14 Updated by Ken Dreyer over 5 years ago

Here's the latest that I've heard:

Upstream we are switching to mod_proxy_fcgi entirely for all distros (RHEL and non-RHEL) and all Apaches (2.2 and 2.4).

Downstream in RHCS 1.2.3, we're switching to mod_proxy_fcgi entirely for all distros (RHEL and non-RHEL) and all Apaches (2.2 and 2.4).

Downstream in ICE 1.2.2, we're keeping mod_fastcgi. I am not clear on what that entails (let alone for Teuthology).

#15 Updated by Andrew Schoen over 5 years ago

It sounds like teuthology needs to support installing and configuring both mod_fastcgi and mod_proxy_fcgi. Is that right, Ken?

The suite being ran will most likely need to tell teuthology which of the two it wants depending on what it's testing (RHCS 1.2.3 or ICE 1.2.2).

Probably defaulting to mod_proxy_fcgi unless told otherwise.

#16 Updated by Andrew Schoen over 5 years ago

We discussed this in standup today and it was decided that we'll have ceph-qa-chef install both mod_proxy_fcgi and mod_fastcgi. We'll then add a new argument to the rgw task, 'use_fastcgi'. If that argument is used in the config for a test run we'll configure and use mod_fastcgi instead of mod_proxy_fcgi. The usage of mod_proxy_fcgi will be the default.

#17 Updated by Andrew Schoen over 5 years ago

  • Status changed from Need More Info to In Progress
  • Assignee changed from Ken Dreyer to Andrew Schoen

#18 Updated by Andrew Schoen over 5 years ago

A nice thread on ceph-devel about using mod_proxy_fcgi with rgw.

http://article.gmane.org/gmane.comp.file-systems.ceph.devel/23280/match=using+radosgw+mod_proxy_fcgi

This mentions there are different ceph configs for tcp vs uds when using mod_proxy_fcgi, unsure what this means for teuthology quite yet.

#19 Updated by Zack Cerza over 5 years ago

  • Target version changed from sprint23 to sprint24

#20 Updated by Andrew Schoen over 5 years ago

  • Status changed from In Progress to Need More Info
  • Assignee deleted (Andrew Schoen)

I've got the machinery in place that will now use mod_proxy_fcgi by default, but allow the usage of mod_fastcgi by adding 'use_fastcgi' to the rgw section of overrides in a jobs config. This is in the wip-10551 branch of ceph-qa-suite.

We're waiting on the mod_proxy_fcgi packages to be built (issue #10808) and for finalized configuration for mod_proxy_fcgi.

#21 Updated by Andrew Schoen over 5 years ago

  • Assignee set to Yehuda Sadeh

I've been testing this with rhel 7.0 nodes in octo. When I attempt to use mod_proxy_fcgi with TCP and run s3 tests against the cluster the s3 tests fail with a 503 returned from apache.

Full log from the run: http://magna002.ceph.redhat.com/andrewschoen/rgw-test/teuthology.log

Here's my yaml config.

interactive-on-error: true
roles:
- [mon.a, mon.c, osd.0, osd.1, osd.2, client.0]
- [mon.b, osd.3, osd.4, osd.5, client.1]

tasks:
- chef:
- install:
    branch: firefly
- ceph:
- rgw: [client.0]
- s3tests:
    client.0:
      rgw_server: client.0
      force-branch: firefly-original
- interactive:
overrides:
  ceph:
    fs: ext4
    conf:
      client:
        debug rgw: 20
  rgw:
    ec-data-pool: false
    frontend: apache

This is the apache config I'm using.

<IfModule !version_module>
  LoadModule version_module /usr/lib64/httpd/modules/mod_version.so
</IfModule>
<IfModule !env_module>
  LoadModule env_module /usr/lib64/httpd/modules/mod_env.so
</IfModule>
<IfModule !rewrite_module>
  LoadModule rewrite_module /usr/lib64/httpd/modules/mod_rewrite.so
</IfModule>
<IfModule !log_config_module>
  LoadModule log_config_module /usr/lib64/httpd/modules/mod_log_config.so
</IfModule>

Listen 7280
ServerName magna065.ceph.redhat.com

<IfVersion >= 2.4>
  <IfModule !unixd_module>
    LoadModule unixd_module /usr/lib64/httpd/modules/mod_unixd.so
  </IfModule>
  <IfModule !authz_core_module>
    LoadModule authz_core_module /usr/lib64/httpd/modules/mod_authz_core.so
  </IfModule>
  <IfModule !mpm_worker_module>
    LoadModule mpm_worker_module /usr/lib64/httpd/modules/mod_mpm_worker.so
  </IfModule>
  User apache
</IfModule>                                                                                                                                                                
  User apache
  Group apache
</IfVersion>

ServerRoot /home/ubuntu/cephtest/apache
ErrorLog /home/ubuntu/cephtest/archive/apache.client.0/error.log
LogFormat "%h l %u %t \"%r\" %>s %b \"{Referer}i\" \"%{User-agent}i\"" combined
CustomLog /home/ubuntu/cephtest/archive/apache.client.0/access.log combined
PidFile /home/ubuntu/cephtest/apache/tmp.client.0/apache.pid
DocumentRoot /home/ubuntu/cephtest/apache/htdocs.client.0

# Set fastcgi environment variables.
# Note that this is separate from Unix environment variables!
SetEnv RGW_LOG_LEVEL 20
SetEnv RGW_SHOULD_LOG yes
SetEnv RGW_PRINT_CONTINUE off

<Directory /home/ubuntu/cephtest/apache/htdocs.client.0>
  Options +ExecCGI
  AllowOverride All
  SetHandler fastcgi-script
</Directory>

AllowEncodedSlashes On
ServerSignature Off
MaxRequestsPerChild 0

# mod_proxy_fcgi config, using TCP

<IfModule !proxy_module>
  LoadModule proxy_module /usr/lib64/httpd/modules/mod_proxy.so
</IfModule>
<IfModule !proxy_fcgi_module>
  LoadModule proxy_fcgi_module /usr/lib64/httpd/modules/mod_proxy_fcgi.so
</IfModule>

RewriteEngine On

RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

SetEnv proxy-nokeepalive 1

ProxyPass / fcgi://127.0.0.1:9000/

Here is the apache error log.

[Wed Feb 18 09:58:38.352078 2015] [proxy:error] [pid 17066:tid 139940637701888] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (127.0.0.1) fail[0/1890]
[Wed Feb 18 09:58:38.352171 2015] [proxy:error] [pid 17066:tid 139940637701888] AH00959: ap_proxy_connect_backend disabling worker for (127.0.0.1) for 60s
[Wed Feb 18 09:58:38.352180 2015] [proxy_fcgi:error] [pid 17066:tid 139940637701888] [client 10.8.128.65:53193] AH01079: failed to make connection to backend: 127.0.0.1
[Wed Feb 18 09:58:38.684963 2015] [proxy:error] [pid 17066:tid 139940629309184] AH00940: FCGI: disabled connection for (127.0.0.1)
[Wed Feb 18 09:58:40.560886 2015] [proxy:error] [pid 17066:tid 139940620916480] AH00940: FCGI: disabled connection for (127.0.0.1)
[Wed Feb 18 09:58:45.507838 2015] [proxy:error] [pid 17066:tid 139940612523776] AH00940: FCGI: disabled connection for (127.0.0.1)
[Wed Feb 18 09:58:45.698571 2015] [proxy:error] [pid 17066:tid 139940604131072] AH00940: FCGI: disabled connection for (127.0.0.1)
[Wed Feb 18 09:58:52.771122 2015] [proxy:error] [pid 17066:tid 139940595738368] AH00940: FCGI: disabled connection for (127.0.0.1)
[Wed Feb 18 09:59:08.452079 2015] [proxy:error] [pid 17066:tid 139940587345664] AH00940: FCGI: disabled connection for (127.0.0.1)

#22 Updated by Yehuda Sadeh over 5 years ago

radosgw is not using tcp for fastcgi here. Need to update ceph.conf with the following:

        rgw socket path = "" 
        rgw frontends = fastcgi socket_port=9000 socket_host=0.0.0.0

Also, when using mod-proxy-fcgi we need to disable 100-continue in the gateway:

        rgw print continue = false

#23 Updated by Andrew Schoen over 5 years ago

I just tried adding those two different ways in teuthology and got a 500 back from apache when running the s3 tests. (the yaml config I used is in a previous comment)

I tried in the yaml config.

 10 - ceph:
 11     conf:
 12        global:
 13          rgw print continue: false
 14          rgw socket path: "" 
 15          rgw frontends: fastcgi socket_port=9000 socket_host=0.0.0.0

I also tried adding it to the apache configuration.

  3 SetEnv RGW_SOCKET_PATH "" 
  4 SetEnv RGW_FRONTENDS "fastcgi socket_port=9000 socket_host=0.0.0.0" 
  5 SetEnv RGW_PRINT_CONTINUE "false" 

Is SetEnv a supported way to do this? I noticed we did something similar for the mod_fastcgi config. If we can do this in the apache config it'd be cleaner for teuthology anyway.

#24 Updated by Andrew Schoen over 5 years ago

I was able to get mod_proxy_fcgi to work with tcp on rhel 7. The apache configs in the above comments were correct, the issue was that radosgw was running with the --socket-path option which was overriding my ceph.conf changes. I also changed 127.0.0.1 to 0.0.0.0 in the apache config.

Still need to get mod_proxy_fcgi to work with uds and then figure out a way to set the ceph.conf changes I want from the command line instead of using the overrides in the config files.

#25 Updated by Andrew Schoen over 5 years ago

The version of apache we're using on the magna rhel machines is not the version that includes the UDS patch. That patch was added in 2.4.6-19. Currently, there isn't a way for the magna machines to consume that patch. We're looking into options to fix that now.

#26 Updated by Andrew Schoen over 5 years ago

  • Status changed from Need More Info to In Progress

#27 Updated by Zack Cerza over 5 years ago

  • Target version changed from sprint24 to sprint25

#28 Updated by Andrew Schoen over 5 years ago

This run schedules a job that runs s3tests using mod_fastcgi and a job that runs s3tests using mod_proxy_fcgi (tcp only) for a variety of distros.

Most all the fastcgi tests passed. The fcgi test seemed to have ran all the tests, but had numerous test failures. Except centos and ubuntu 12.04 which failed for other reasons.

http://pulpito.front.sepia.ceph.com/ubuntu-2015-03-03_10:25:40-teuthology-dumpling---basic-multi/

#29 Updated by Yehuda Sadeh over 5 years ago

The tests that fail are failing due to slight incompatibilities in a few edge cases, and due to the partial 100-continue support of mod-proxy-fcgi. We probably want to turn these tests off for mod-proxy-fcgi.

#30 Updated by Andrew Schoen over 5 years ago

Yehuda, that seems reasonable. We'd want to add another attribute to these tests then, maybe 'fails_on_fcgi'?

Then when we run tests against fcgi we just change -a '!fails_on_rgw' to -a '!fails_on_rgw,!fails_of_fcgi'.

Are you thinking that every test that failed in the jobs I scheduled should get this new attribute?

#31 Updated by Andrew Schoen over 5 years ago

The tests that failed:

s3tests.functional.test_s3.test_100_continue
s3tests.functional.test_headers.test_bucket_create_bad_contentlength_unreadable
s3tests.functional.test_headers.test_bucket_create_bad_contentlength_negative
s3tests.functional.test_headers.test_object_create_bad_contentlength_unreadable
s3tests.functional.test_headers.test_object_create_bad_contentlength_negative

#32 Updated by Yehuda Sadeh over 5 years ago

Yes, these are the tests that are expected to fail. I'd name it 'fails_on_mod_proxy_fcgi', as fcgi might be confusing.

#33 Updated by Andrew Schoen over 5 years ago

The version of apache we'll need on rhel boxes is 2.4.6-19 if we want to use mod_proxy_fcgi with unix domain sockets, otherwise we'll need to use tcp.

Look for this in the changelog 'mod_proxy: support Unix Domain Sockets (#1170286)'

rpm -q --changelog httpd | grep mod_proxy

#34 Updated by Andrew Schoen over 5 years ago

  • Assignee changed from Yehuda Sadeh to Andrew Schoen

Assigning this back to myself. At this point I have passing runs using mod_procy_fcgi with TCP on rhel 7 and ubuntu 14.04. We still need to resolve the lab issue that keeps us from installing the version of apache we need for UDS support on rhel 7 (is there a ticket for this?) and we still need packages built for the distros that are listed in issue #10808 for apache 2.2.

#35 Updated by Andrew Schoen over 5 years ago

PR for the s3tests portion of this, https://github.com/ceph/s3-tests/pull/44

#36 Updated by Ken Dreyer over 5 years ago

The ticket for the lab issue of easily consuming RHEL updates is #11022. (Note that only Red Hat staff can see that #11022 ticket; the basic gist of the ticket is to implement the feature of automatically entitling our RHEL servers to Red Hat's CDN with subscription-manager.)

#37 Updated by Zack Cerza over 5 years ago

  • Target version changed from sprint25 to sprint26

#38 Updated by Zack Cerza over 5 years ago

  • Target version changed from sprint26 to sprint27

#40 Updated by Andrew Schoen over 5 years ago

As discussed in standup today we'd like to reverse the default behavior here and continue to use mod_fastcgi by default instead of mod_proxy_fcgi.

After that we can safely merge this.

#41 Updated by Zack Cerza over 5 years ago

  • Target version changed from sprint27 to sprint28

#42 Updated by Zack Cerza over 5 years ago

  • Target version changed from sprint28 to sprint29

#44 Updated by Jian Niu about 5 years ago

with ceph (0.94.2-1trusty) and Ubuntu 14.04.2 LTS, I follow the instructions, but i still get 500 error when I try with S3 test.
please help

$ python s3test.py
Traceback (most recent call last):
File "s3test.py", line 12, in <module>
bucket = conn.create_bucket('my-new-bucket')
File "/usr/lib/python2.7/dist-packages/boto/s3/connection.py", line 495, in create_bucket
data=data)
File "/usr/lib/python2.7/dist-packages/boto/s3/connection.py", line 547, in make_request
retry_handler=retry_handler
File "/usr/lib/python2.7/dist-packages/boto/connection.py", line 1017, in make_request
retry_handler=retry_handler)
File "/usr/lib/python2.7/dist-packages/boto/connection.py", line 973, in _mexe
raise BotoServerError(response.status, response.reason, body)
boto.exception.BotoServerError: BotoServerError: 503 Service Unavailable

apache log;

[Mon Aug 03 15:21:01.821520 2015] [authz_core:debug] [pid 5644:tid 140484104718080] mod_authz_core.c(828): [client 10.195.98.170:36970] AH01628: authorization result: granted (no directives)
[Mon Aug 03 15:21:01.821556 2015] [proxy_fcgi:debug] [pid 5644:tid 140484104718080] mod_proxy_fcgi.c(73): [client 10.195.98.170:36970] AH01060: set r->filename to proxy:fcgi://127.0.0.1:9000/my-new-bucket/
[Mon Aug 03 15:21:01.821572 2015] [proxy:debug] [pid 5644:tid 140484104718080] mod_proxy.c(1104): [client 10.195.98.170:36970] AH01143: Running scheme fcgi handler (attempt 0)
[Mon Aug 03 15:21:01.821576 2015] [proxy_fcgi:debug] [pid 5644:tid 140484104718080] mod_proxy_fcgi.c(764): [client 10.195.98.170:36970] AH01076: url: fcgi://127.0.0.1:9000/my-new-bucket/ proxyname: (null) proxyport: 0
[Mon Aug 03 15:21:01.821580 2015] [proxy_fcgi:debug] [pid 5644:tid 140484104718080] mod_proxy_fcgi.c(774): [client 10.195.98.170:36970] AH01078: serving URL //127.0.0.1:9000/my-new-bucket/
[Mon Aug 03 15:21:01.821591 2015] [proxy:debug] [pid 5644:tid 140484104718080] proxy_util.c(2020): AH00942: FCGI: has acquired connection for (127.0.0.1)
[Mon Aug 03 15:21:01.821597 2015] [proxy:debug] [pid 5644:tid 140484104718080] proxy_util.c(2072): [client 10.195.98.170:36970] AH00944: connecting //127.0.0.1:9000/my-new-bucket/ to 127.0.0.1:9000
[Mon Aug 03 15:21:01.821651 2015] [proxy:debug] [pid 5644:tid 140484104718080] proxy_util.c(2206): [client 10.195.98.170:36970] AH00947: connected /my-new-bucket/ to 127.0.0.1:9000
[Mon Aug 03 15:21:01.821693 2015] [proxy:error] [pid 5644:tid 140484104718080] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (127.0.0.1) failed
[Mon Aug 03 15:21:01.821703 2015] [proxy:error] [pid 5644:tid 140484104718080] AH00959: ap_proxy_connect_backend disabling worker for (127.0.0.1) for 60s

for sgw log

2015-08-03 15:58:16.551676 7fe67b91e840 0 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3), process radosgw, pid 6208
2015-08-03 15:58:16.630600 7fe6437fe700 0 ERROR: can't get key: ret=-2
2015-08-03 15:58:16.630608 7fe6437fe700 0 ERROR: sync_all_users() returned ret=-2
2015-08-03 15:58:16.630778 7fe67b91e840 0 framework: fastcgi
2015-08-03 15:58:16.630808 7fe67b91e840 0 framework: civetweb
2015-08-03 15:58:16.630829 7fe67b91e840 0 framework conf key: port, val: 7480
2015-08-03 15:58:16.630858 7fe67b91e840 0 starting handler: civetweb
2015-08-03 15:58:16.637008 7fe67b91e840 0 starting handler: fastcgi
2015-08-03 15:58:16.637107 7fe533fcf700 0 ERROR: no socket server point defined, cannot start fcgi frontend

Also available in: Atom PDF