Actions
Bug #5245
closedFrequent 500s from radosgw
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Hi,
I have roughly 30 clients talking simultaneously to radosgw over 1Gbps link. I use boto library on the client side.
Frequently I get error 500 when I try to fetch files from radosgw. For now I implemented retry logic in my code, but obviously that's not the right solution :).
The servers are running Apache, but I also tried nginx and it was showing the same behavior.
This is what I see on the client:
IncompleteRead: IncompleteRead(100655630 bytes read, 4201971 more expected)
This is what shows up in Apache logs:
1.2.3.4 - - [04/Jun/2013:11:26:52 +0200] "GET /foo/bar HTTP/1.1" 500 100655630 "-" "Boto/2.2.2 (linux2)" [Tue Jun 04 11:26:52 2013] [error] [client 1.2.3.4] (4)Interrupted system call: FastCGI: comm with server "/var/www/radosgw" aborted: select() failed [Tue Jun 04 11:26:52 2013] [error] [client 1.2.3.4] Handler for fastcgi-script returned invalid result code 1
And finally this is an excerpt from radosgw debug log:
7f517fff7700 0 NOTICE: failed to send response to client 7f517fff7700 0 ERROR: s->cio->print() returned err=-1 7f517fff7700 0 ERROR: s->cio->print() returned err=-1 7f517fff7700 0 ERROR: s->cio->print() returned err=-1 7f517fff7700 0 ERROR: s->cio->print() returned err=-1 7f517fff7700 2 req 6:9.736544:s3:GET /foo/bar:get_obj:http status=403 7f5182ffd700 20 rados->read r=0 bl.length=524288 7f517fff7700 1 ====== req done req=0x1998af0 http_status=403 ======
Installed packages:
ii librados2 0.56.6-1~bpo60+1 RADOS distributed object store client library ii radosgw 0.56.6-1~bpo60+1 REST gateway for RADOS distributed object store ii ceph 0.56.6-1~bpo60+1 distributed storage and file system
Please advise how to handle this.
Thanks,
Jiri
Actions