Bug #111
closed
Added by Sage Weil almost 14 years ago.
Updated over 13 years ago.
Description
currently we just return this to the caller, when we should retry.
not that the osd returns this very often (ever?)
- Target version changed from v2.6.35 to v2.6.36
We should make the client handle it, but we should also try to make sure that the osd doesn't ever return it (at least shouldn't propagate EAGAIN returned from a syscall).
Yehuda Sadeh wrote:
We should make the client handle it, but we should also try to make sure that the osd doesn't ever return it (at least shouldn't propagate EAGAIN returned from a syscall).
Currently it looks like the only place that happens is
ReplicatedPG.cc:2406: return -EAGAIN;
which could be changed to put the request on the waiting_for_object queue instead.
I wonder, though, if that general approach will be problematic in the long run when we try to fix the unbounded memory usage. It might be better to queue op a "reply with -EAGAIN when we're ready to handle this" type of thing to avoid keeping lots of pending writes sitting around. (Of course, if the client has the osd timeout enabled, that will just mean a period reconnect and resend, too.)
I agree. Though we should differentiate between two cases. One is that we initiate the EAGAIN (e.g., when reached a limit), the other is that we inherited it from a syscall we made -- although I'm not sure it's actually possible, as we don't really do async i/o at that context. In the former case we should just make sure that the EAGAIN worth the RTT to the client. We need to avoid the latter.
- Status changed from New to Resolved
Looks to me like this can't actually happen. The function ReplicatedPG::find_object_context can return EAGAIN, and any find_object_context return codes are immediately sent to the client in ReplicatedPG::do_op, but as best I can see all the paths that lead to do_op go through OSD::handle_op, which makes sure the needed objects exist (for find_object_context to not return EAGAIN).
I threw an assert(r != -EAGAIN) in there to catch it if it does happen, though, in which case we can revisit this if needed.
Also available in: Atom
PDF