Bug #1822: radosgw can be slow to respond to requests
radosgw should have internal timeouts
Letting Apache time out the rados gateway makes admins sad, since there's no visibility into what is actually timing out. Radosgw should have internal timeouts, and then gather some metrics to return when it hits them.
#1 Updated by Yehuda Sadeh almost 9 years ago
We can have timeouts for the init process for other operations I'm not sure it'll make sense doing it in the rgw layer. Apache already has its own timeouts and there's not much of interesting information that rgw can show. Admins want to see which osd is misbehaving, rgw can't give that. It'll also make the I/O path much more complicated than it is.
What we can do is having librados (or the messaging layer underneath) keep track of which operations take too long and dump those somewhere. That'll give admins the relevant info they're looking for.
#2 Updated by Greg Farnum almost 9 years ago
RGW ought to be able to grab information about IOs which are taking too long and figure out what OSD that IO resides on, if it hits a timeout. Doing this cleanly will be a little annoying but shouldn't be too bad; RGWRados (and other storage systems) can implement a callback "timeout_error_diagnostic" or something which gives back a string to use.