Bug #111: handle EAGAIN from osd - Linux kernel client - Ceph

Actions

Copy link

Bug #111

closed

handle EAGAIN from osd

Added by Sage Weil almost 14 years ago. Updated over 13 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Target version:

v2.6.36

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

currently we just return this to the caller, when we should retry.

not that the osd returns this very often (ever?)

Actions

Copy link

Updated by Sage Weil almost 14 years ago

Target version changed from v2.6.35 to v2.6.36

Actions

Copy link

Updated by Yehuda Sadeh almost 14 years ago

We should make the client handle it, but we should also try to make sure that the osd doesn't ever return it (at least shouldn't propagate EAGAIN returned from a syscall).

Actions

Copy link

Updated by Sage Weil almost 14 years ago

Yehuda Sadeh wrote:

We should make the client handle it, but we should also try to make sure that the osd doesn't ever return it (at least shouldn't propagate EAGAIN returned from a syscall).

Currently it looks like the only place that happens is

ReplicatedPG.cc:2406:    return -EAGAIN;

which could be changed to put the request on the waiting_for_object queue instead.

I wonder, though, if that general approach will be problematic in the long run when we try to fix the unbounded memory usage. It might be better to queue op a "reply with -EAGAIN when we're ready to handle this" type of thing to avoid keeping lots of pending writes sitting around. (Of course, if the client has the osd timeout enabled, that will just mean a period reconnect and resend, too.)

Actions

Copy link

Updated by Yehuda Sadeh almost 14 years ago

I agree. Though we should differentiate between two cases. One is that we initiate the EAGAIN (e.g., when reached a limit), the other is that we inherited it from a syscall we made -- although I'm not sure it's actually possible, as we don't really do async i/o at that context. In the former case we should just make sure that the EAGAIN worth the RTT to the client. We need to avoid the latter.

Actions

Copy link

Updated by Greg Farnum almost 14 years ago

Status changed from New to Resolved

Looks to me like this can't actually happen. The function ReplicatedPG::find_object_context can return EAGAIN, and any find_object_context return codes are immediately sent to the client in ReplicatedPG::do_op, but as best I can see all the paths that lead to do_op go through OSD::handle_op, which makes sure the needed objects exist (for find_object_context to not return EAGAIN).
I threw an assert(r != -EAGAIN) in there to catch it if it does happen, though, in which case we can revisit this if needed.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #111

handle EAGAIN from osd

Updated by Sage Weil almost 14 years ago

Updated by Yehuda Sadeh almost 14 years ago

Updated by Sage Weil almost 14 years ago

Updated by Yehuda Sadeh almost 14 years ago

Updated by Greg Farnum almost 14 years ago