Bug #6636
closedsockaddr_storage and uuid_t are not portable to other platforms
0%
Description
> From: <asomers@gmail.com> > Date: Fri, Oct 25, 2013 at 10:35 AM > Subject: Ceph's networking protocol is operating-system dependent > To: Noah Watkins <noah.watkins@inktank.com> > > > For several weeks, I've been vexed by a networking problem with Ceph. > If I start a 1-node Linux cluster, the /usr/local/bin/rados client on > FreeBSD can connect just fine. However, if I start a 1-node FreeBSD > cluster, then the /usr/bin/rados client on Linux will hit an > assertion. > > I finally figured it out. The problem is that the encode function at > msg/msg_types.h:164 casts a struct sockaddr_storage to a char*, which > is subsequently sent over the wire. However, struct sockaddr_storage > is an operating system specific data structure. It's different in > FreeBSD than in Linux. It doesn't even have the packed attribute, > which means that it could technically differ between compilers, > although it probably doesn't. Somebody, probably you, tried to > account for the difference by flipping the endianness of the ss_family > field on Linux but not FreeBSD or OSX. But there's another > difference: FreeBSD has a ss_len field where Linux has the high byte > of ss_family. When a serialized struct sockaddr_storage gets sent > from FreeBSD to Linux, Linux things that ss_family is 0x1002 instead > of 0x0002, leading to the crash. > > I haven't fully audited the code, but there are probably other such > Linux/amd64 assumptions. One such place is include/uuid.h:39, which > encodes a uuid_t. uuid_t is also different between FreeBSD and Linux.
I think we need to create ceph_sockaddr and ceph_uuid to define the wire format (as what Linux does) so that the translate to the platform-specific structures is clean and explicit (and easily pluggable).
Files
Updated by Alan Somers over 10 years ago
Here's a patch that fixes the problem for struct sockaddr_storage. I haven't looked at uuid_t yet. Googling suggested that among common operating systems, the only variation in struct sockaddr_storage is the presence or absence of ss_len. So I created a ./configure check for that. I also dealt with the different sizes of ss_family with casting instead of #ifdef checks. Finally, I added code to deal with mismatched sizes of the two structures, even though I don't know of any platforms with a different size. The result is code that should work on most platforms with no OS-specific #ifdef's. With this patch on wip-port, I can successfully perform rados commands between Linux and FreeBSD clients and servers, in either direction.
Updated by Noah Watkins over 10 years ago
Awesome, thanks Alan. I'll pull this into wip-port for the time being.
Updated by Noah Watkins over 10 years ago
Added pull request with this patch for easier discussion
Updated by Greg Farnum about 7 years ago
Dan Mick did this for a port a while ago, and the new messenger stuff also helps.