Bug #7888
closed
msgr: keepalive is insufficient
Added by Sage Weil about 10 years ago.
Updated about 5 years ago.
Description
the current keepalive behavior relies on writes triggering a tcp timeout/error, which does not actually happy in many cases (like ifdown eth0).
instead, we need something like a request/reply exchange to guarantee liveness.
- Assignee set to Sage Weil
- Status changed from 12 to In Progress
- Status changed from In Progress to Fix Under Review
wip-7888 handles this for MonClient. We can do the same with Objecter, but this is less critical because we will find out via the osdmap if they are really down.
There is a bit of a concern about this whole approach, though: if the server isn't reading data because it has hit its memory throttle, the client may time out and reconnect, resending a bunch of the same messages, making the memory pressure even worse. It really is better if this can be handled a bit lower down in the protocol layer.
Alternatively, the Messenger throttle could be redone so that it is cooperative and never stops reading data off the socket. But then we end up having to implement all the same flow control that TCP is already giving us... meh!
- Status changed from Fix Under Review to Pending Backport
- Status changed from Pending Backport to Resolved
- Project changed from Ceph to Messengers
- Category deleted (
msgr)
Also available in: Atom
PDF