Project

General

Profile

Tasks #1823

Bug #1822: radosgw can be slow to respond to requests

radosgw should have internal timeouts

Added by Greg Farnum over 12 years ago. Updated about 6 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

Letting Apache time out the rados gateway makes admins sad, since there's no visibility into what is actually timing out. Radosgw should have internal timeouts, and then gather some metrics to return when it hits them.


Related issues

Duplicates Ceph - Feature #1879: osd: track list of in-progress requests, log slow ones Resolved 01/04/2012

History

#1 Updated by Yehuda Sadeh over 12 years ago

We can have timeouts for the init process for other operations I'm not sure it'll make sense doing it in the rgw layer. Apache already has its own timeouts and there's not much of interesting information that rgw can show. Admins want to see which osd is misbehaving, rgw can't give that. It'll also make the I/O path much more complicated than it is.

What we can do is having librados (or the messaging layer underneath) keep track of which operations take too long and dump those somewhere. That'll give admins the relevant info they're looking for.

#2 Updated by Greg Farnum over 12 years ago

RGW ought to be able to grab information about IOs which are taking too long and figure out what OSD that IO resides on, if it hits a timeout. Doing this cleanly will be a little annoying but shouldn't be too bad; RGWRados (and other storage systems) can implement a callback "timeout_error_diagnostic" or something which gives back a string to use.

#3 Updated by Yehuda Sadeh over 12 years ago

I think I wasn't clear enough. RGW doesn't need to do that in the I/O path. Anyway, we need to think of the functional requirements before we come up with any solution.

#4 Updated by Anonymous about 12 years ago

Sage suggests that this can more properly be detected in the OSD:
- add request to tail list when started
- remove when complete
- periodically scan start of list and log slow requests

#5 Updated by Sage Weil about 12 years ago

  • Status changed from New to Rejected

#6 Updated by John Spray about 6 years ago

Bulk reassign of radosgw category to RGW project.

Also available in: Atom PDF