msgr: behave better when ending TCP connections
TV is telling me that if we're not confirming that each side of the connection calls ::shutdown() on the socket, we're not ending our TCP connection properly. Obviously it can work out okay even so, but we want to be good citizens and fixing this up will likely reduce the edge cases where we need to call mark_disposable() on pipes.
#1 Updated by Josh Durgin over 8 years ago
- Priority changed from Normal to High
This actually caused a deadlock with ffsb on the kernel client - ffsb ended up with 1006 connections in the CLOSING state, and the osd had 1006 in FIN_WAIT2. This made the osd hit max open file descriptors at 1024. (The other osd crashed for a different reason).
#3 Updated by Greg Farnum over 8 years ago
- Status changed from New to In Progress
From the little I'm reading in Unix Network Programming, it looks like we're just doing this wrong — we call shutdown(RD_WR) and then try to read, which never works. And we don't call close() until we get our successful read (or after timeouts when we mark_disposable).
So presumably just fixing that will deal with it.
#4 Updated by Greg Farnum over 8 years ago
And I've flipped back and forth umpteen times today about what's going on. At this point I can conclude that nobody on our end knows, but probably one of close() or shutdown() is actually removing the buffer (probably close()). So the proper fix is going to involve reworking the messenger so that it does separate shutdown calls for SO_WR and then does shutdown() for SO_RD after receiving an EOF from the other side.
#9 Updated by Greg Farnum almost 6 years ago
- Status changed from Resolved to New
This has been greatly improved with the addition of our socket timeouts and things, but I don't think it's properly resolved yet. It will get a great deal easier when the messenger doesn't have a thread<->socket relationship.