Project

General

Profile

Msgr - implement infiniband support via rsockets

Summary

Add Infinibad support by using the rsockets library.

Owners

  • Name (Affiliation)

Interested Parties

  • Kasper Dieter (Fujitsu)
  • Andreas Bluemle (itxperts.de)
  • Sage Weil (Inktank)
  • Mark Nelson (Inktank)
  • Danny Al-Gaaf
  • Andrey Korolyov (flops.ru)

Current Status

The SimpleMessenger (msg/*) module handles all network communication in Ceph and is currently based on the normal sockets API using TCP.
Network addresses are currently sorted in entity_addr_t, a wrapper around struct sockaddr_storage (80 bytes, IIRC), which is supposed to be big enough for any network address.
The SimpleMessenger code has a long and storied lineage, is multithreaded (2 threads per socket!), and is difficult to follow--both because of the code and because of the complexity of the protocol. Rewriting the whole thing around an explicit state machine using thread pools and poll(2) has be on the wish list for a long time. I do not think that it is a blocker for rsockets support, although it might be nice to do it at the same time.
My currently reading of rsockets() capabilities is that the endpoint addresses look like ipv4/v6 addrs, but the peers both use the r*() calls and negotiate an rsockets session. This means that we need to distinguish between IP endpoints and IP+rsockets endpoints. This is probably simplest to do by modifying entity_addr_t and including a special address type. entity_addr_t::type is current always == 0, so we can defined a 1 (or whatever) for rsockets.
Almost all socket calls are confined to Accepter.cc (which is small) and Pipe.cc (which is not). Most actual socket calls use a handful of wrappers:
  • tcp_read
  • tcp_read_wait
  • tcp_read_nonblocking
  • tcp_write
  • shutdown_socket
  • do_sendmsg
A few other direct calls need to be converted to use wrapper:
  • getpeername
  • setsockopt
  • socket
  • connect
  • close

Once these are all wrapped, a simple conditional on the peer address type (entity_addr_t::get_type()) can conditionally use the normal socket syscall or the equivalent rsockets call.

Detailed Description

rsockets is supposed to follow the normal sockes API very closely, making it easy to use in existing applications.
http://linux.die.net/man/7/rsocket
I hope that getting a prototype working is as simple as creating a new address type and putting some conditionals around all of the socket calls in msg/Pipe.cc and msg/Accepter.cc.

Work items

Coding tasks

  1. msg/msg_types: add rsockets address support to entity_addr_t via the type field. add accessors, update operator<<()
  2. msg/Pipe: add wrappers for unwrapped socket calls
  3. msg/Accepter, config: add conditional bind to either socket or rsocket
  4. msg/Pipe: update wrappers to either use socket or rsocket call based on peer address type
  5. profit!

Build / release tasks

  1. add library detection to configure.ac
  2. conditionally compile the rsockets support

Documentation tasks

  1. write howto document