Project

General

Profile

Actions

Bug #19867

closed

SimpleMessenger: osd cluster messenger pipe connection is only one direction

Added by Peng Xie almost 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
teuthology
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

recently i found an interesting simple messenger pipe connection problems among osds' cluster network.

my test environment is 2 ceph nodes, 2 osds in total, one osd per node. 2 replica. ceph-10.2.2 in use.
osd.0 in node xt1, with cluster ip: 192.168.100.1, public ip: 192.168.10.88
osd.1 in node xt2, with cluster ip: 192.168.100.2, public ip: 192.168.10.88

netstat shows that there is only one connction between the two osds
cluster network(192.168.100.0/24), listen port is 6800 :

osd.0 (tcp client: 192.168.100.1:55380) -----> osd.1 (tcp server: 192.168.100.2:6800)

we should expect there is another pipe connection from
osd.1 (192.168.100.2:xxxxx) to osd.0 (192.168.100.1:6800), but it was not as expected .

the netstat shows below:

[root@xt1 ]# netstat -ntpa |grep ceph
tcp 0 0 192.168.10.88:6789 0.0.0.0:* LISTEN 17395/ceph-mon
tcp 0 0 192.168.100.1:6800 0.0.0.0:* LISTEN 17435/ceph-osd
tcp 0 0 192.168.10.88:6800 0.0.0.0:* LISTEN 17435/ceph-osd
tcp 0 0 192.168.10.88:6801 0.0.0.0:* LISTEN 17435/ceph-osd
tcp 0 0 192.168.100.1:6801 0.0.0.0:* LISTEN 17435/ceph-osd
tcp 0 0 192.168.100.1:48532 192.168.100.2:6800 ESTABLISHED 17435/ceph-osd
tcp 0 0 192.168.100.1:6801 192.168.100.2:43908 ESTABLISHED 17435/ceph-osd
tcp 0 0 192.168.100.1:51819 192.168.100.2:6801 ESTABLISHED 17435/ceph-osd
tcp 0 0 192.168.10.88:6801 192.168.10.89:32938 ESTABLISHED 17435/ceph-osd
tcp 0 9 192.168.10.88:6789 192.168.10.88:32834 ESTABLISHED 17395/ceph-mon
tcp 0 0 192.168.10.88:38788 192.168.10.89:6801 ESTABLISHED 17435/ceph-osd
tcp 0 0 192.168.10.88:6789 192.168.10.89:45039 ESTABLISHED 17395/ceph-mon
tcp 0 0 192.168.10.88:32834 192.168.10.88:6789 ESTABLISHED 17435/ceph-osd

[root@xt2]# netstat -ntpa |grep ceph
tcp 0 0 192.168.100.2:6800 0.0.0.0:* LISTEN 7642/ceph-osd
tcp 0 0 192.168.10.89:6800 0.0.0.0:* LISTEN 7642/ceph-osd
tcp 0 0 192.168.10.89:6801 0.0.0.0:* LISTEN 7642/ceph-osd
tcp 0 0 192.168.100.2:6801 0.0.0.0:* LISTEN 7642/ceph-osd
tcp 0 0 192.168.10.89:45039 192.168.10.88:6789 ESTABLISHED 7642/ceph-osd
tcp 0 0 192.168.100.2:6800 192.168.100.1:48532 ESTABLISHED 7642/ceph-osd
tcp 0 0 192.168.10.89:6801 192.168.10.88:38788 ESTABLISHED 7642/ceph-osd
tcp 0 0 192.168.100.2:43908 192.168.100.1:6801 ESTABLISHED 7642/ceph-osd
tcp 0 0 192.168.100.2:6801 192.168.100.1:51819 ESTABLISHED 7642/ceph-osd
tcp 0 0 192.168.10.89:32938 192.168.10.88:6801 ESTABLISHED 7642/ceph-osd

after analysis the osd log, the timeline of the error code logic was shown below:

1. during osd.1 start up, it first create and register the new Pipe-C in the cluster_messenger's
rank_pipe as ["192.168.100.1:6800", Pipe-C], and the call Pipe::Connect to osd.0
2. at the same time, osd.1 as the cluster_messenger tcp server side to Pipe::accept() the
connection request from osd.0. osd.1 will receive the peer_addr as "192.168.100.1:6800".
to call the _lookup_pipe in the rank_pipe map, it found the connecting Pipe-C in step-1.
it will close and replace Pipe-C the the new Pipe-A in the rank_pipe map, which finally is
["192.168.100.1:6800", Pipe-A]
3. since Pipe-C was closed, osd.0 Pipe::accept() encounter an error and gives up the connection.
4. the following IO from osd.1 was replicated to osd.0 through the osd.1 server-side Pipe-A,
however it is supposed to using the separate client-side Pipe-C to transfer the internal IO.

as my analysis shows: it will hurt the performance of the internal data transfer between the osds .


the log in the timeline order shows below:

time-1
in Node-xt2 (<<<--- is my comments):
2017-04-19 14:15:19.843828 7f3e79b19700 10 -- 192.168.100.2:6800/2283 connect_rank to 192.168.100.1:6800/12758, creating pipe and registering
2017-04-19 14:15:19.843866 7f3e79b19700 10 -- 192.168.100.2:6800/2283 >> 192.168.100.1:6800/12758 pipe(0x7f3ea24ce000 sd=-1 :0 s=1 pgs=0 cs=0 l=0 c=0x7f3ea13f2a00).register_pipe
<<<--- the osd.1 cluster_messenger client-side pipe will be registered in rank_pipe as ["192.168.100.1:6800", Pipe-C]
and start to call Pipe::connect() .
.......
.......
2017-04-19 14:15:19.845668 7f3e6e2fb700 10 -- 192.168.100.2:6800/2283 >> 192.168.100.1:6800/12758 pipe(0x7f3ea24ce000 sd=-1 :0 s=1 pgs=0 cs=0 l=0 c=0x7f3ea13f2a00).connect 0
2017-04-19 14:15:19.845678 7f3e6e2fb700 10 -- 192.168.100.2:6800/2283 >> 192.168.100.1:6800/12758 pipe(0x7f3ea24ce000 sd=53 :0 s=1 pgs=0 cs=0 l=0 c=0x7f3ea13f2a00).connecting to 192.168.100.1:6800/12758 .
.....

time-2
in Node-xt1:

2017-04-19 06:15:19.843643 7f2385164700 10 -- 192.168.100.1:6800/12758 >> 192.168.100.2:6800/2283 pipe(0x7f23b97eb400 sd=158 :0 s=1 pgs=0 cs=0 l=0 c=0x7f23b970d900).connecting to 192.168.100.2:6800/2283
<<<--- osd.0 cluster_messenger client-side pipe first call in Pipe::connect() to osd.1
.....
2017-04-19 06:15:19.848022 7f2384e61700 10 -- 192.168.100.1:6800/12758 >> :/0 pipe(0x7f23b9894000 sd=161 :0 s=0 pgs=0 cs=0 l=0 c=0x7f23b97ca300).accept
<<<--- osd.0 cluster_messenger server-side start to Pipe::Accept() for the osd.1 connect request. the accept fd is 161 .
2017-04-19 06:15:19.848133 7f2384e61700 1 -- 192.168.100.1:6800/12758 >> :/0 pipe(0x7f23b9894000 sd=161 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f23b97ca300).accept sd=161 192.168.100.2:53220/0
<<<--- osd.0 got the peer .

time-3
in Node-xt2 :

2017-04-19 14:15:19.846220 7f3e6e3fc700 1 -- 192.168.100.2:6800/2283 >> :/0 pipe(0x7f3ea2bca800 sd=45 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f3ea1b5e180).accept sd=45 192.168.100.1:45300/0
2017-04-19 14:15:19.846233 7f3e6e800700 10 -- 192.168.100.2:6801/2283 >> :/0 pipe(0x7f3ea0942000 sd=50 :0 s=0 pgs=0 cs=0 l=0 c=0x7f3ea1871380).accept
<<<--- osd.1 cluster_messenger server-side pipe start Pipe::accept() for the osd.0 connection request .
2017-04-19 14:15:19.849302 7f3e6e3fc700 10 -- 192.168.100.2:6800/2283 >> 192.168.100.1:6800/12758 pipe(0x7f3ea2bca800 sd=45 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f3ea1b5e180).accept peer addr is 192.168.100.1:6800/12758
2017-04-19 14:15:19.849306 7f3e6e3fc700 10 -- 192.168.100.2:6800/2283 >> 192.168.100.1:6800/12758 pipe(0x7f3ea2bca800 sd=45 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f3ea1b5e180).accept of host_type 4, policy.lossy=0 policy.server=0 policy.standby=1 policy.resetcheck=0
2017-04-19 14:15:19.849308 7f3e6e3fc700 10 -- 192.168.100.2:6800/2283 >> 192.168.100.1:6800/12758 pipe(0x7f3ea2bca800 sd=45 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f3ea1b5e180).accept my proto 10, their proto 10
2017-04-19 14:15:19.849369 7f3e6e3fc700 10 -- 192.168.100.2:6800/2283 >> 192.168.100.1:6800/12758 pipe(0x7f3ea2bca800 sd=45 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f3ea1b5e180).accept: setting up session_security.
2017-04-19 14:15:19.849373 7f3e6e3fc700 10 -- 192.168.100.2:6800/2283 >> 192.168.100.1:6800/12758 pipe(0x7f3ea2bca800 sd=45 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f3ea1b5e180).accept existing 0x7f3ea24ce000.gseq 0 <= 1, looks ok .
<<<---please note that 0x7f3ea24ce000 is our Pipe-C in ["192.168.100.1:6800", Pipe-C] in time-1 .
2017-04-19 14:15:19.849376 7f3e6e3fc700 0 -- 192.168.100.2:6800/2283 >> 192.168.100.1:6800/12758 pipe(0x7f3ea2bca800 sd=45 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f3ea1b5e180).accept connect_seq 0 vs existing 0 state connecting
2017-04-19 14:15:19.849378 7f3e6e3fc700 10 -- 192.168.100.2:6800/2283 >> 192.168.100.1:6800/12758 pipe(0x7f3ea2bca800 sd=45 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f3ea1b5e180).accept connection race, existing 0x7f3ea24ce000.cseq 0 == 0, or we are server, replacing my attempt .
2017-04-19 14:15:19.849381 7f3e6e3fc700 10 -- 192.168.100.2:6800/2283 >> 192.168.100.1:6800/12758 pipe(0x7f3ea2bca800 sd=45 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f3ea1b5e180).accept replacing 0x7f3ea24ce000 . <<<---- will replace and stop the ongoing osd.1 cluster_messenger's client-side Pipe connection
2017-04-19 14:15:19.849383 7f3e6e3fc700 10 -- 192.168.100.2:6800/2283 >> 192.168.100.1:6800/12758 pipe(0x7f3ea24ce000 sd=53 :53220 s=1 pgs=0 cs=0 l=0 c=0x7f3ea13f2a00).stop .
2017-04-19 14:15:19.849444 7f3e6e3fc700 10 -- 192.168.100.2:6800/2283 >> 192.168.100.1:6800/12758 pipe(0x7f3ea24ce000 sd=53 :53220 s=4 pgs=0 cs=0 l=0 c=0x7f3ea13f2a00).unregister_pipe.
......
2017-04-19 14:15:19.849619 7f3e8c4d4700 10 -- 192.168.100.2:6800/2283 reaper deleted pipe 0x7f3ea24ce000 . <<<<--- finally the reaper thread will destroy the Pipe-C structure .

time-4
in Node-xt1 :

2017-04-19 06:15:19.851553 7f2384e61700 10 -- 192.168.100.1:6800/12758 >> :/0 pipe(0x7f23b9894000 sd=161 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f23b97ca300).accept couldn't read banner
<<<---due to the connection wrongly be destroyed by the osd.1 in time-3
2017-04-19 06:15:19.855975 7f2384e61700 10 -- 192.168.100.1:6800/12758 >> :/0 pipe(0x7f23b9894000 sd=161 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f23b97ca300). queued = 0
2017-04-19 06:15:19.855994 7f2384e61700 2 -- 192.168.100.1:6800/12758 >> :/0 pipe(0x7f23b9894000 sd=161 :6800 s=4 pgs=0 cs=0 l=0 c=0x7f23b97ca300).fault (0) Success
2017-04-19 06:15:19.856009 7f2384e61700 10 -- 192.168.100.1:6800/12758 >> :/0 pipe(0x7f23b9894000 sd=161 :6800 s=4 pgs=0 cs=0 l=0 c=0x7f23b97ca300).fault already closed|closing
2017-04-19 06:15:19.856043 7f2384e61700 10 -- 192.168.100.1:6800/12758 queue_reap 0x7f23b9894000
2017-04-19 06:15:19.856050 7f2384e61700 10 -- 192.168.100.1:6800/12758 >> :/0 pipe(0x7f23b9894000 sd=161 :6800 s=4 pgs=0 cs=0 l=0 c=0x7f23b97ca300).reader done

Actions

Also available in: Atom PDF