Project

General

Profile

Bug #1942

msgr: Address family not supported by protocol

Added by Sage Weil about 12 years ago. Updated about 5 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

History

#1 Updated by Sage Weil about 12 years ago

Still not sure how the bad address made it into the map (or OSDBoot) message, but at least it won't crash now as of commit:a4642946b284c0b0fd85587e0c0b0bbf0ec4b0b4

#2 Updated by Sage Weil about 12 years ago

  • Status changed from New to 7

we now report a connection fault instead of asserting. and during initialization we check for bind() errors. afaics that would catch the initial problem. it's still a mystery how this bit two people on the v0.40 upgrade, though.

#3 Updated by Sage Weil about 12 years ago

commit:dcceb8e835cbf40173c334de18bd68c2cf7f3716 add the osd_fsid to the OSDSuperblock message and reved the version. Old code happily decodes the new structure it doesn't understand, but stops before the new field. OSDSuperblock is embedded in the MOSDBoot, so old code will continue decoding the next fields (cluster_addr, hb_addr) starting with the new osd_fsid field, and the result is a zeroed out address. This made it into the osdmap, which triggered the impolite assert.

The new wip-encoding would have helped by just ignoring the new field.

The workaround is to restart the v0.40 monitors first, then restart the osds, so that old monitors don't see new OSDSuperblock encodings.

The monitor should reject a boot message if the addresses appear to be invalid.

#4 Updated by Sage Weil about 12 years ago

  • Status changed from 7 to Won't Fix

#5 Updated by Greg Farnum about 5 years ago

  • Project changed from Ceph to Messengers
  • Category deleted (msgr)
  • Target version deleted (v0.41)

Also available in: Atom PDF