Bug #5245
Frequent 500s from radosgw
0%
Description
Hi,
I have roughly 30 clients talking simultaneously to radosgw over 1Gbps link. I use boto library on the client side.
Frequently I get error 500 when I try to fetch files from radosgw. For now I implemented retry logic in my code, but obviously that's not the right solution :).
The servers are running Apache, but I also tried nginx and it was showing the same behavior.
This is what I see on the client:
IncompleteRead: IncompleteRead(100655630 bytes read, 4201971 more expected)
This is what shows up in Apache logs:
1.2.3.4 - - [04/Jun/2013:11:26:52 +0200] "GET /foo/bar HTTP/1.1" 500 100655630 "-" "Boto/2.2.2 (linux2)" [Tue Jun 04 11:26:52 2013] [error] [client 1.2.3.4] (4)Interrupted system call: FastCGI: comm with server "/var/www/radosgw" aborted: select() failed [Tue Jun 04 11:26:52 2013] [error] [client 1.2.3.4] Handler for fastcgi-script returned invalid result code 1
And finally this is an excerpt from radosgw debug log:
7f517fff7700 0 NOTICE: failed to send response to client 7f517fff7700 0 ERROR: s->cio->print() returned err=-1 7f517fff7700 0 ERROR: s->cio->print() returned err=-1 7f517fff7700 0 ERROR: s->cio->print() returned err=-1 7f517fff7700 0 ERROR: s->cio->print() returned err=-1 7f517fff7700 2 req 6:9.736544:s3:GET /foo/bar:get_obj:http status=403 7f5182ffd700 20 rados->read r=0 bl.length=524288 7f517fff7700 1 ====== req done req=0x1998af0 http_status=403 ======
Installed packages:
ii librados2 0.56.6-1~bpo60+1 RADOS distributed object store client library ii radosgw 0.56.6-1~bpo60+1 REST gateway for RADOS distributed object store ii ceph 0.56.6-1~bpo60+1 distributed storage and file system
Please advise how to handle this.
Thanks,
Jiri
History
#1 Updated by Yehuda Sadeh almost 11 years ago
Could it be that you let apache spawn the gateways by itself? Or maybe running multiple gateways over the same socket? What's your apache fastcgi config?
#2 Updated by Jiri Brunclik almost 11 years ago
This is my Apache config:
LoadModule fastcgi_module /usr/lib/apache2/modules/mod_fastcgi.so FastCgiExternalServer /var/www/radosgw -socket /tmp/radosgw.a.sock RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /var/www/radosgw?page=$1¶ms=$2&%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
And this is the relevant part from ceph.conf:
[client.radosgw.a] host = foo keyring = /etc/ceph/radosgw.keyring rgw print continue = false rgw socket path = /tmp/radosgw.a.sock log file = /var/log/ceph/radosgw.a.log
I am using worker MPM with the following settings:
StartServers 2 MinSpareThreads 8 MaxSpareThreads 16 ThreadLimit 96 ThreadsPerChild 8 MaxClients 96
#3 Updated by Yehuda Sadeh almost 11 years ago
Can you verify that you only have a single gateway running on that socket, and that the process id does not change when running?
#4 Updated by Ian Colle almost 11 years ago
- Assignee set to Yehuda Sadeh
#5 Updated by Jiri Brunclik almost 11 years ago
Yes, there is a single radosgw process:
# pgrep radosgw 29132
And it has been running for quite some time now:
root 29132 6.5 0.2 1603036 88576 ? SNsl May14 2009:04 /usr/bin/radosgw -n client.radosgw.a
I am running Apache 2.2.16 with mod_fastcgi 2.4.6.
#6 Updated by Sage Weil over 10 years ago
- Status changed from New to Can't reproduce