Project

General

Profile

Bug #5245

Frequent 500s from radosgw

Added by Jiri Brunclik almost 11 years ago. Updated over 10 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

I have roughly 30 clients talking simultaneously to radosgw over 1Gbps link. I use boto library on the client side.

Frequently I get error 500 when I try to fetch files from radosgw. For now I implemented retry logic in my code, but obviously that's not the right solution :).

The servers are running Apache, but I also tried nginx and it was showing the same behavior.

This is what I see on the client:

IncompleteRead: IncompleteRead(100655630 bytes read, 4201971 more expected)

This is what shows up in Apache logs:

1.2.3.4 - - [04/Jun/2013:11:26:52 +0200] "GET /foo/bar HTTP/1.1" 500 100655630 "-" "Boto/2.2.2 (linux2)" 
[Tue Jun 04 11:26:52 2013] [error] [client 1.2.3.4] (4)Interrupted system call: FastCGI: comm with server "/var/www/radosgw" aborted: select() failed
[Tue Jun 04 11:26:52 2013] [error] [client 1.2.3.4] Handler for fastcgi-script returned invalid result code 1

And finally this is an excerpt from radosgw debug log:

7f517fff7700  0 NOTICE: failed to send response to client
7f517fff7700  0 ERROR: s->cio->print() returned err=-1
7f517fff7700  0 ERROR: s->cio->print() returned err=-1
7f517fff7700  0 ERROR: s->cio->print() returned err=-1
7f517fff7700  0 ERROR: s->cio->print() returned err=-1
7f517fff7700  2 req 6:9.736544:s3:GET /foo/bar:get_obj:http status=403
7f5182ffd700 20 rados->read r=0 bl.length=524288
7f517fff7700  1 ====== req done req=0x1998af0 http_status=403 ======

Installed packages:

ii  librados2                           0.56.6-1~bpo60+1             RADOS distributed object store client library
ii  radosgw                             0.56.6-1~bpo60+1             REST gateway for RADOS distributed object store
ii  ceph                                0.56.6-1~bpo60+1             distributed storage and file system

Please advise how to handle this.

Thanks,

Jiri

History

#1 Updated by Yehuda Sadeh almost 11 years ago

Could it be that you let apache spawn the gateways by itself? Or maybe running multiple gateways over the same socket? What's your apache fastcgi config?

#2 Updated by Jiri Brunclik almost 11 years ago

This is my Apache config:

LoadModule fastcgi_module /usr/lib/apache2/modules/mod_fastcgi.so
FastCgiExternalServer /var/www/radosgw -socket /tmp/radosgw.a.sock
RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /var/www/radosgw?page=$1&params=$2&%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

And this is the relevant part from ceph.conf:

[client.radosgw.a]
        host = foo
        keyring = /etc/ceph/radosgw.keyring
        rgw print continue = false
        rgw socket path = /tmp/radosgw.a.sock
        log file = /var/log/ceph/radosgw.a.log

I am using worker MPM with the following settings:

StartServers         2
MinSpareThreads      8
MaxSpareThreads     16 
ThreadLimit         96
ThreadsPerChild      8
MaxClients          96

#3 Updated by Yehuda Sadeh almost 11 years ago

Can you verify that you only have a single gateway running on that socket, and that the process id does not change when running?

#4 Updated by Ian Colle almost 11 years ago

  • Assignee set to Yehuda Sadeh

#5 Updated by Jiri Brunclik almost 11 years ago

Yes, there is a single radosgw process:

# pgrep radosgw
29132

And it has been running for quite some time now:

root     29132  6.5  0.2 1603036 88576 ?       SNsl May14 2009:04 /usr/bin/radosgw -n client.radosgw.a

I am running Apache 2.2.16 with mod_fastcgi 2.4.6.

#6 Updated by Sage Weil over 10 years ago

  • Status changed from New to Can't reproduce

Also available in: Atom PDF