Project

General

Profile

Actions

Bug #7888

closed

msgr: keepalive is insufficient

Added by Sage Weil about 10 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

the current keepalive behavior relies on writes triggering a tcp timeout/error, which does not actually happy in many cases (like ifdown eth0).

instead, we need something like a request/reply exchange to guarantee liveness.

Actions #1

Updated by Sage Weil about 10 years ago

  • Assignee set to Sage Weil
Actions #2

Updated by Sage Weil about 10 years ago

  • Status changed from 12 to In Progress
Actions #3

Updated by Sage Weil about 10 years ago

  • Status changed from In Progress to Fix Under Review
Actions #4

Updated by Sage Weil about 10 years ago

wip-7888 handles this for MonClient. We can do the same with Objecter, but this is less critical because we will find out via the osdmap if they are really down.

There is a bit of a concern about this whole approach, though: if the server isn't reading data because it has hit its memory throttle, the client may time out and reconnect, resending a bunch of the same messages, making the memory pressure even worse. It really is better if this can be handled a bit lower down in the protocol layer.

Alternatively, the Messenger throttle could be redone so that it is cooperative and never stops reading data off the socket. But then we end up having to implement all the same flow control that TCP is already giving us... meh!

Actions #5

Updated by Sage Weil about 10 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Sage Weil about 10 years ago

  • Status changed from Pending Backport to Resolved
Actions #7

Updated by Greg Farnum about 5 years ago

  • Project changed from Ceph to Messengers
  • Category deleted (msgr)
Actions

Also available in: Atom PDF