Project

General

Profile

Fix #8905

msgr: encode osd epoch in nonce to avoid misc OSD reconnect races

Added by Sage Weil over 9 years ago. Updated over 4 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We currently cannot tell whether an incoming connection is from an older or newer osd... we can only tell if it is in our current OSDMap. For older osds, we need to reject the connection. For newer ones, we need to accept it.

Fix this by putting some epoch number into the high bits of the nonce (low bits can remain PID, unless we have a better idea). Then the receiver can tell how to behave.

- need to pass the nonce value down through rebind()
- need to make sure it is also set properly during the initial boot phase

Which epoch?

- not up_epoch, since we bind and pick our addr before we get added to a map.
- not boot_epoch, i think, because i think that is only set when the process starts
- bind_epoch is the right one, I think.

On the receiver end, if bind_epoch > our epoch, accept. If bind_epoch < osdmap->get_info(osd)->down_at, reject.

(Someday, it could even explicitly 'reject' such that the connector doesn't keep retrying; that is an orthogonal problem, though.)

Will fix #8880

History

#1 Updated by Samuel Just over 9 years ago

  • Target version changed from 0.85 to 0.85 cont.

#2 Updated by Sage Weil over 9 years ago

  • Target version deleted (0.85 cont.)

#3 Updated by Sage Weil over 9 years ago

  • Target version set to 0.85 cont.

#4 Updated by Sage Weil over 9 years ago

  • Status changed from New to In Progress

#5 Updated by Ian Colle over 9 years ago

  • Target version changed from 0.85 cont. to 0.86

#6 Updated by Samuel Just over 9 years ago

  • Target version changed from 0.86 to 0.88

#7 Updated by Sage Weil over 9 years ago

  • Target version deleted (0.88)

#8 Updated by Sage Weil over 9 years ago

  • Status changed from In Progress to 12

#9 Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New

Also available in: Atom PDF