Project

General

Profile

Msgr - implement infiniband support via rsockets » History » Version 1

Jessica Mack, 06/21/2015 04:07 AM

1 1 Jessica Mack
h1. Msgr - implement infiniband support via rsockets
2
3
h3. Summary
4
5
Add Infinibad support by using the rsockets library.
6
7
h3. Owners
8
9
* Name (Affiliation)
10
11
h3. Interested Parties
12
13
* Kasper Dieter (Fujitsu)
14
* Andreas Bluemle (itxperts.de)
15
* Sage Weil (Inktank)
16
* Mark Nelson (Inktank)
17
* Danny Al-Gaaf 
18
* Andrey Korolyov (flops.ru)
19
20
h3. Current Status
21
22
The SimpleMessenger (msg/*) module handles all network communication in Ceph and is currently based on the normal sockets API using TCP.
23
Network addresses are currently sorted in entity_addr_t, a wrapper around struct sockaddr_storage (80 bytes, IIRC), which is supposed to be big enough for any network address.
24
The SimpleMessenger code has a long and storied lineage, is multithreaded (2 threads per socket!), and is difficult to follow--both because of the code and because of the complexity of the protocol.  Rewriting the whole thing around an explicit state machine using thread pools and poll(2) has be on the wish list for a long time.  I do not think that it is a blocker for rsockets support, although it might be nice to do it at the same time.
25
My currently reading of rsockets() capabilities is that the endpoint addresses look like ipv4/v6 addrs, but the peers both use the r*() calls and negotiate an rsockets session.  This means that we need to distinguish between IP endpoints and IP+rsockets endpoints.  This is probably simplest to do by modifying entity_addr_t and including a special address type.  entity_addr_t::type is current always == 0, so we can defined a 1 (or whatever) for rsockets. 
26
Almost all socket calls are confined to Accepter.cc (which is small) and Pipe.cc (which is not).  Most actual socket calls use a handful of wrappers:
27
* tcp_read
28
* tcp_read_wait
29
* tcp_read_nonblocking
30
* tcp_write
31
* shutdown_socket
32
* do_sendmsg
33
34
A few other direct calls need to be converted to use wrapper:
35
* getpeername
36
* setsockopt
37
* socket
38
* connect
39
* close
40
41
Once these are all wrapped, a simple conditional on the peer address type (entity_addr_t::get_type()) can conditionally use the normal socket syscall or the equivalent rsockets call.
42
43
h3. Detailed Description
44
45
rsockets is supposed to follow the normal sockes API very closely, making it easy to use in existing applications.
46
 http://linux.die.net/man/7/rsocket
47
I hope that getting a prototype working is as simple as creating a new address type and putting some conditionals around all of the socket calls in msg/Pipe.cc and msg/Accepter.cc.
48
49
h3. Work items
50
51
h4. Coding tasks
52
53
# msg/msg_types: add rsockets address support to entity_addr_t via the type field.  add accessors, update operator<<()
54
# msg/Pipe: add wrappers for unwrapped socket calls
55
# msg/Accepter, config: add conditional bind to either socket or rsocket
56
# msg/Pipe: update wrappers to either use socket or rsocket call based on peer address type
57
# profit!
58
59
h4. Build / release tasks
60
61
# add library detection to configure.ac
62
# conditionally compile the rsockets support
63
64
h4. Documentation tasks
65
66
# write howto document