Project

General

Profile

Actions

Bug #6636

closed

sockaddr_storage and uuid_t are not portable to other platforms

Added by Sage Weil over 10 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

> From:  <asomers@gmail.com>
> Date: Fri, Oct 25, 2013 at 10:35 AM
> Subject: Ceph's networking protocol is operating-system dependent
> To: Noah Watkins <noah.watkins@inktank.com>
> 
> 
> For several weeks, I've been vexed by a networking problem with Ceph.
> If I start a 1-node Linux cluster, the /usr/local/bin/rados client on
> FreeBSD can connect just fine.  However, if I start a 1-node FreeBSD
> cluster, then the /usr/bin/rados client on Linux will hit an
> assertion.
> 
> I finally figured it out.  The problem is that the encode function at
> msg/msg_types.h:164 casts a struct sockaddr_storage to a char*, which
> is subsequently sent over the wire.  However, struct sockaddr_storage
> is an operating system specific data structure.  It's different in
> FreeBSD than in Linux.  It doesn't even have the packed attribute,
> which means that it could technically differ between compilers,
> although it probably doesn't.  Somebody, probably you, tried to
> account for the difference by flipping the endianness of the ss_family
> field on Linux but not FreeBSD or OSX.  But there's another
> difference: FreeBSD has a ss_len field where Linux has the high byte
> of ss_family.  When a serialized struct sockaddr_storage gets sent
> from FreeBSD to Linux, Linux things that ss_family is 0x1002 instead
> of 0x0002, leading to the crash.
> 
> I haven't fully audited the code, but there are probably other such
> Linux/amd64 assumptions.  One such place is include/uuid.h:39, which
> encodes a uuid_t.  uuid_t is also different between FreeBSD and Linux.

I think we need to create ceph_sockaddr and ceph_uuid to define the wire format (as what Linux does) so that the translate to the platform-specific structures is clean and explicit (and easily pluggable).


Files

Actions #1

Updated by Alan Somers over 10 years ago

Here's a patch that fixes the problem for struct sockaddr_storage. I haven't looked at uuid_t yet. Googling suggested that among common operating systems, the only variation in struct sockaddr_storage is the presence or absence of ss_len. So I created a ./configure check for that. I also dealt with the different sizes of ss_family with casting instead of #ifdef checks. Finally, I added code to deal with mismatched sizes of the two structures, even though I don't know of any platforms with a different size. The result is code that should work on most platforms with no OS-specific #ifdef's. With this patch on wip-port, I can successfully perform rados commands between Linux and FreeBSD clients and servers, in either direction.

Actions #2

Updated by Noah Watkins over 10 years ago

Awesome, thanks Alan. I'll pull this into wip-port for the time being.

Actions #3

Updated by Noah Watkins over 10 years ago

Added pull request with this patch for easier discussion

https://github.com/ceph/ceph/pull/828

Actions #4

Updated by Sage Weil about 10 years ago

  • Status changed from New to 12
Actions #5

Updated by Sage Weil about 10 years ago

  • Assignee set to Noah Watkins
Actions #6

Updated by Sage Weil about 7 years ago

  • Status changed from 12 to Resolved
Actions #7

Updated by Greg Farnum about 7 years ago

Dan Mick did this for a port a while ago, and the new messenger stuff also helps.

Actions

Also available in: Atom PDF