Project

General

Profile

Actions

Fix #8905

open

msgr: encode osd epoch in nonce to avoid misc OSD reconnect races

Added by Sage Weil almost 10 years ago. Updated over 4 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We currently cannot tell whether an incoming connection is from an older or newer osd... we can only tell if it is in our current OSDMap. For older osds, we need to reject the connection. For newer ones, we need to accept it.

Fix this by putting some epoch number into the high bits of the nonce (low bits can remain PID, unless we have a better idea). Then the receiver can tell how to behave.

- need to pass the nonce value down through rebind()
- need to make sure it is also set properly during the initial boot phase

Which epoch?

- not up_epoch, since we bind and pick our addr before we get added to a map.
- not boot_epoch, i think, because i think that is only set when the process starts
- bind_epoch is the right one, I think.

On the receiver end, if bind_epoch > our epoch, accept. If bind_epoch < osdmap->get_info(osd)->down_at, reject.

(Someday, it could even explicitly 'reject' such that the connector doesn't keep retrying; that is an orthogonal problem, though.)

Will fix #8880

Actions #1

Updated by Samuel Just over 9 years ago

  • Target version changed from 0.85 to 0.85 cont.
Actions #2

Updated by Sage Weil over 9 years ago

  • Target version deleted (0.85 cont.)
Actions #3

Updated by Sage Weil over 9 years ago

  • Target version set to 0.85 cont.
Actions #4

Updated by Sage Weil over 9 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Ian Colle over 9 years ago

  • Target version changed from 0.85 cont. to 0.86
Actions #6

Updated by Samuel Just over 9 years ago

  • Target version changed from 0.86 to 0.88
Actions #7

Updated by Sage Weil over 9 years ago

  • Target version deleted (0.88)
Actions #8

Updated by Sage Weil over 9 years ago

  • Status changed from In Progress to 12
Actions #9

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions

Also available in: Atom PDF