Tasks #1823
Bug #1822: radosgw can be slow to respond to requests
radosgw should have internal timeouts
0%
Description
Letting Apache time out the rados gateway makes admins sad, since there's no visibility into what is actually timing out. Radosgw should have internal timeouts, and then gather some metrics to return when it hits them.
Related issues
History
#1 Updated by Yehuda Sadeh almost 12 years ago
We can have timeouts for the init process for other operations I'm not sure it'll make sense doing it in the rgw layer. Apache already has its own timeouts and there's not much of interesting information that rgw can show. Admins want to see which osd is misbehaving, rgw can't give that. It'll also make the I/O path much more complicated than it is.
What we can do is having librados (or the messaging layer underneath) keep track of which operations take too long and dump those somewhere. That'll give admins the relevant info they're looking for.
#2 Updated by Greg Farnum almost 12 years ago
RGW ought to be able to grab information about IOs which are taking too long and figure out what OSD that IO resides on, if it hits a timeout. Doing this cleanly will be a little annoying but shouldn't be too bad; RGWRados (and other storage systems) can implement a callback "timeout_error_diagnostic" or something which gives back a string to use.
#3 Updated by Yehuda Sadeh almost 12 years ago
I think I wasn't clear enough. RGW doesn't need to do that in the I/O path. Anyway, we need to think of the functional requirements before we come up with any solution.
#4 Updated by Anonymous almost 12 years ago
Sage suggests that this can more properly be detected in the OSD:
- add request to tail list when started
- remove when complete
- periodically scan start of list and log slow requests
#5 Updated by Sage Weil almost 12 years ago
- Status changed from New to Rejected
#6 Updated by John Spray almost 6 years ago
Bulk reassign of radosgw category to RGW project.