Project

General

Profile

Actions

Bug #10445

closed

Radosgw hang, corrupt manifest?

Added by Aaron Bassett over 9 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm load testing my radosgw setup to get an idea what kind of throughput I can get. I have it set up writing to a cache tier of size 2 in front of an EC storage pool at 10+2 (this data is non-critical). I'm testing it by attempting to upload 2000 files alternating between several kB and 100 GB. I'm uploading 25 files at once using 10 threads for each with a part size of 512MB.

I can get this to run and upload at nearly line speed for an hour or two, but eventually the gateway always melts. It keeps running, but stop processing any requests and I'm not sure what its doing. The cluster stays HEALTH OK for the duration. Once I restart the gateway, it will sit and garbage collect happily, and accept new uploads, but attempting to delete or re-upload any of the existing uploads results in a 500 error from the gateway. The only way I've been able to clear them out is to delete all the gateways pool and start from scratch on the gateway.

While I've been unable to find the point where the gateway dies in the logs, I have captured a log of attempting to cancel an existing upload with debug ms = 1 and debug rgw =20. It's rather verbose, and I'm not sure what to look for in it, but when I was running it without verbosity turned up, I kept seeing a message like

"2015-01-02 14:16:55.409181 7f4d07fb7700 0 RGWObjManifest::operator++(): result: ofs=134217728 stripe_ofs=134217728 part_ofs=0 rule->part_size=0"

when I try to cancel an upload. I thoink it may be in a loop. As soon as I try to cancel, it just starts spitting those out at fast as it can and the cancel request eventually 500s. At this point I have to kill -9 the gw process.


Files

radosgw.log (2.37 MB) radosgw.log Aaron Bassett, 01/02/2015 06:23 AM
civet_cancel.log (34.8 MB) civet_cancel.log Aaron Bassett, 01/08/2015 07:13 AM
Actions

Also available in: Atom PDF