Actions
Bug #37779
closedmsg/async: connection race + winner fault can leave connection in standby
% Done:
0%
Source:
Tags:
Backport:
luminous,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Description
on one end,
2019-01-02 16:28:53.247 7f13ea225700 10 -- 172.21.15.125:6802/33738 >> 172.21.15.6:6809/33741 conn(0x557db5ee3800 legacy :6802 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept existing 0x557db5f00000.gseq 0 <= 1, looks ok 2019-01-02 16:28:53.247 7f13ea225700 1 -- 172.21.15.125:6802/33738 >> 172.21.15.6:6809/33741 conn(0x557db5ee3800 legacy :6802 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING_RE 2019-01-02 16:28:53.247 7f13ea225700 10 -- 172.21.15.125:6802/33738 >> 172.21.15.6:6809/33741 conn(0x557db5ee3800 legacy :6802 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept connection race, existing 0x557db5f00000.cseq 0 == 0, sending WAIT
but then,
2019-01-02 16:28:53.247 7f13eba28700 0 -- 172.21.15.125:6802/33738 >> 172.21.15.6:6809/33741 conn(0x557db5f00000 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send injecting socket failure 2019-01-02 16:28:53.247 7f13eba28700 1 -- 172.21.15.125:6802/33738 >> 172.21.15.6:6809/33741 conn(0x557db5f00000 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send send error: (32) Broken pipe 2019-01-02 16:28:53.247 7f13eba28700 20 -- 172.21.15.125:6802/33738 >> 172.21.15.6:6809/33741 conn(0x557db5f00000 legacy :-1 s=CONNECTING pgs=0 cs=0 l=0).handle_client_banner_write r=-32 2019-01-02 16:28:53.247 7f13eba28700 1 -- 172.21.15.125:6802/33738 >> 172.21.15.6:6809/33741 conn(0x557db5f00000 legacy :-1 s=CONNECTING pgs=0 cs=0 l=0).handle_client_banner_write write client banner failed 2019-01-02 16:28:53.247 7f13eba28700 20 -- 172.21.15.125:6802/33738 >> 172.21.15.6:6809/33741 conn(0x557db5f00000 legacy :-1 s=CONNECTING pgs=0 cs=0 l=0).fault 2019-01-02 16:28:53.247 7f13eba28700 10 -- 172.21.15.125:6802/33738 >> 172.21.15.6:6809/33741 conn(0x557db5f00000 legacy :-1 s=CONNECTING pgs=0 cs=0 l=0).fault with nothing to send, going to standby
and then nothing.
/a/sage-2019-01-02_14:51:32-rados-master-distro-basic-smithi/3414810
Updated by Sage Weil over 5 years ago
- Status changed from 12 to Fix Under Review
Updated by Sage Weil over 5 years ago
- Related to Bug #37799: msg/async: RESETSESSION due to connection reset during initial connection added
Updated by Sage Weil over 5 years ago
SCRATCH THAT, THE BELOW IS SOMETHING DIFFERENT
a newer instance of this, after several other fixes were applied:
2019-01-09 06:18:16.957 7fa97fff5e40 10 mon.s@-1(probing) e0 monmap e0: 21 mons at {a=[v2:172.21.15.181:3300/0,v1:172.21.15.181:6789/0],b=[v2:172.21.15.125:3300/0,v1:172.21.15.125:6789/0],c=[v2:172.21.15.90:3300/0,v1:172.21.15.90:6789/0],d=[v2:172.21.15.181:3301/0,v1:172.21.15.181:6790/0],e=[v2:172.21.15.125:3301/0 ,v1:172.21.15.125:6790/0],f=[v2:172.21.15.90:3301/0,v1:172.21.15.90:6790/0],g=[v2:172.21.15.181:3302/0,v1:172.21.15.181:6791/0],h=[v2:172.21.15.125:3302/0,v1:172.21.15.125:6791/0],i=[v2:172.21.15.90:3302/0,v1:172.21.15.90:6791/0],j=[v2:172.21.15.181:3303/0,v1:172.21.15.181:6792/0],k=[v2:172.21.15.125:3303/0,v1:172. 21.15.125:6792/0],l=[v2:172.21.15.90:3303/0,v1:172.21.15.90:6792/0],m=[v2:172.21.15.181:3304/0,v1:172.21.15.181:6793/0],n=[v2:172.21.15.125:3304/0,v1:172.21.15.125:6793/0],o=[v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0],p=[v2:172.21.15.181:3305/0,v1:172.21.15.181:6794/0],q=[v2:172.21.15.125:3305/0,v1:172.21.15.12 5:6794/0],r=[v2:172.21.15.90:3305/0,v1:172.21.15.90:6794/0],s=[v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0],t=[v2:172.21.15.125:3306/0,v1:172.21.15.125:6795/0],u=[v2:172.21.15.90:3306/0,v1:172.21.15.90:6795/0]} 2019-01-09 06:18:16.958 7fa97fff5e40 1 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] _send_to--> mon [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] -- mon_probe(probe e45a7d54-1462-4a6a-b138-5e03b8543640 name s new) v6 -- ?+0 0x55e3ab9d9b80 2019-01-09 06:18:16.958 7fa97fff5e40 20 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] submit_message mon_probe(probe e45a7d54-1462-4a6a-b138-5e03b8543640 name s new) v6 remote, [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0], new connection. 2019-01-09 06:18:16.958 7fa97fff5e40 10 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] create_connect [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0], creating connection and registering 2019-01-09 06:18:16.958 7fa97fff5e40 10 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3aba0f000 msgr2 :-1 s=STATE_NONE l=0)._connect 2019-01-09 06:18:16.958 7fa97fff5e40 10 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] create_connect 0x55e3aba0f000 [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] 2019-01-09 06:18:16.958 7fa97fff5e40 1 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] --> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] -- mon_probe(probe e45a7d54-1462-4a6a-b138-5e03b8543640 name s new) v6 -- 0x55e3ab9d9b80 con 0x55e3aba0f000 2019-01-09 06:18:16.958 7fa97fff5e40 15 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3aba0f000 msgr2 :-1 s=START_CONNECT pgs=0 cs=0 l=0).send_message inline write is denied, reschedule m=0x55e3ab9d9b80 2019-01-09 06:18:16.958 7fa965919700 20 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3aba0f000 msgr2 :-1 s=STATE_CONNECTING l=0).process 2019-01-09 06:18:16.958 7fa965919700 10 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3aba0f000 msgr2 :-1 s=STATE_CONNECTING_RE l=0).process nonblock connect inprogress 2019-01-09 06:18:16.958 7fa965919700 4 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3aba0f000 msgr2 :-1 s=STATE_CONNECTING_RE l=0).handle_write 2019-01-09 06:18:16.958 7fa965919700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3aba0f000 msgr2 :-1 s=START_CONNECT pgs=0 cs=0 l=0).write_event 2019-01-09 06:18:16.959 7fa965919700 20 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3aba0f000 msgr2 :-1 s=STATE_CONNECTING_RE l=0).process 2019-01-09 06:18:16.959 7fa965919700 1 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3aba0f000 msgr2 :-1 s=STATE_CONNECTING_RE l=0).process reconnect failed 2019-01-09 06:18:16.959 7fa965919700 2 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3aba0f000 msgr2 :-1 s=STATE_CONNECTING_RE l=0).process connection refused! 2019-01-09 06:18:16.959 7fa965919700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3aba0f000 msgr2 :-1 s=START_CONNECT pgs=0 cs=0 l=0).fault 2019-01-09 06:18:16.959 7fa965919700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3aba0f000 msgr2 :-1 s=START_CONNECT pgs=0 cs=0 l=0).fault waiting 0.200000 ... 2019-01-09 06:18:17.063 7fa965118700 20 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=33 2019-01-09 06:18:17.063 7fa965118700 20 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=STATE_CONNECTION_ESTABLISHED l=0).process 2019-01-09 06:18:17.063 7fa965118700 20 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=STATE_CONNECTION_ESTABLISHED l=0).read continue len=33 2019-01-09 06:18:17.063 7fa965118700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING pgs=0 cs=0 l=0).handle_connect_message_1 r=0 2019-01-09 06:18:17.063 7fa965118700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message_auth 2019-01-09 06:18:17.063 7fa965118700 20 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=158 2019-01-09 06:18:17.063 7fa965118700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_auth r=0 2019-01-09 06:18:17.063 7fa965118700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 2019-01-09 06:18:17.063 7fa965118700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept got peer connect_seq 0 global_seq 21 2019-01-09 06:18:17.063 7fa965118700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept of host_type 1, policy.lossy=0 policy.server=0 policy.standby=1 policy.resetcheck=1 2019-01-09 06:18:17.063 7fa965118700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept my proto 13, their proto 13 2019-01-09 06:18:17.063 7fa965118700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 authorizor_protocol 2 len 158 2019-01-09 06:18:17.063 7fa965118700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2: challenging authorizer 2019-01-09 06:18:17.063 7fa965118700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).send_connect_message_reply 2019-01-09 06:18:17.063 7fa965118700 10 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 58 remaining bytes 0 2019-01-09 06:18:17.063 7fa965118700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_reply_write r=0 2019-01-09 06:18:17.063 7fa965118700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message 2019-01-09 06:18:17.063 7fa965118700 20 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=33 2019-01-09 06:18:17.068 7fa965118700 20 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=STATE_CONNECTION_ESTABLISHED l=0).process 2019-01-09 06:18:17.068 7fa965118700 20 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=STATE_CONNECTION_ESTABLISHED l=0).read continue len=33 2019-01-09 06:18:17.068 7fa965118700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_1 r=0 2019-01-09 06:18:17.068 7fa965118700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message_auth 2019-01-09 06:18:17.068 7fa965118700 20 -- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=158 2019-01-09 06:18:17.068 7fa965118700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_auth r=0 2019-01-09 06:18:17.068 7fa965118700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 2019-01-09 06:18:17.068 7fa965118700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept got peer connect_seq 0 global_seq 21 2019-01-09 06:18:17.068 7fa965118700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept of host_type 1, policy.lossy=0 policy.server=0 policy.standby=1 policy.resetcheck=1 2019-01-09 06:18:17.068 7fa965118700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept my proto 13, their proto 13 2019-01-09 06:18:17.068 7fa965118700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 authorizor_protocol 2 len 158 2019-01-09 06:18:17.068 7fa965118700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept setting up session_security. 2019-01-09 06:18:17.068 7fa965118700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 existing 0x55e3aba0f000 on [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] 2019-01-09 06:18:17.068 7fa965118700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept existing 0x55e3aba0f000.gseq 0 <= 21, looks ok 2019-01-09 06:18:17.068 7fa965118700 1 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING 2019-01-09 06:18:17.068 7fa965118700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept connection race, existing 0x55e3aba0f000.cseq 0 == 0, or we are server, replacing my attempt 2019-01-09 06:18:17.068 7fa965118700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).replace accept replacing 0x55e3aba0f000 2019-01-09 06:18:17.068 7fa965118700 20 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).stop 2019-01-09 06:18:17.068 7fa965118700 2 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).stop 2019-01-09 06:18:17.068 7fa965118700 10 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).discard_out_queue started 2019-01-09 06:18:17.068 7fa965118700 1 --2- [v2:172.21.15.181:3306/0,v1:172.21.15.181:6795/0] >> [v2:172.21.15.90:3304/0,v1:172.21.15.90:6793/0] conn(0x55e3abb4e000 msgr2 :45792 s=CLOSED pgs=0 cs=0 l=0).replace stop myself to swap existing
/a/sage-2019-01-08_22:26:18-rados:multimon-wip-sage2-testing-2019-01-08-1310-distro-basic-smithi/3437300
Updated by Greg Farnum over 5 years ago
- Status changed from Fix Under Review to 7
Updated by Sage Weil over 5 years ago
again,
2019-01-12 04:33:12.615 7fc167fa4700 10 --2- v2:172.21.15.197:6806/11137 >> v2:172.21.15.12:6810/11067 conn(0x55bce9336900 msgr2 :49106 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept existing 0x55bce92e7600.gseq 0 <= 12, looks ok 2019-01-12 04:33:12.615 7fc167fa4700 1 --2- v2:172.21.15.197:6806/11137 >> v2:172.21.15.12:6810/11067 conn(0x55bce9336900 msgr2 :49106 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTION_ESTABLISHED 2019-01-12 04:33:12.615 7fc167fa4700 10 --2- v2:172.21.15.197:6806/11137 >> v2:172.21.15.12:6810/11067 conn(0x55bce9336900 msgr2 :49106 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept connection race, existing 0x55bce92e7600.cseq 0 == 0, or we are server, replacing my attempt 2019-01-12 04:33:12.615 7fc167fa4700 10 --2- v2:172.21.15.197:6806/11137 >> v2:172.21.15.12:6810/11067 conn(0x55bce9336900 msgr2 :49106 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).replace accept replacing 0x55bce92e7600
but then
2019-01-12 04:33:12.623 7fc167fa4700 20 --2- v2:172.21.15.197:6806/11137 >> v2:172.21.15.12:6810/11067 conn(0x55bce92e7600 msgr2 :-1 s=ACCEPTING pgs=0 cs=0 l=0).handle_connect_message_1 r=-1 2019-01-12 04:33:12.623 7fc167fa4700 1 --2- v2:172.21.15.197:6806/11137 >> v2:172.21.15.12:6810/11067 conn(0x55bce92e7600 msgr2 :-1 s=ACCEPTING pgs=0 cs=0 l=0).handle_connect_message_1 read connect msg failed 2019-01-12 04:33:12.623 7fc167fa4700 20 --2- v2:172.21.15.197:6806/11137 >> v2:172.21.15.12:6810/11067 conn(0x55bce92e7600 msgr2 :-1 s=ACCEPTING pgs=0 cs=0 l=0).fault 2019-01-12 04:33:12.623 7fc167fa4700 10 --2- v2:172.21.15.197:6806/11137 >> v2:172.21.15.12:6810/11067 conn(0x55bce92e7600 msgr2 :-1 s=ACCEPTING pgs=0 cs=0 l=0).fault with nothing to send, going to standby 2019-01-12 04:33:12.631 7fc168fa6700 10 -- v2:172.21.15.197:6806/11137 connect_to v2:172.21.15.12:6810/11067 existing 0x55bce92e7600 2019-01-12 04:33:14.291 7fc168fa6700 10 -- v2:172.21.15.197:6806/11137 connect_to v2:172.21.15.12:6810/11067 existing 0x55bce92e7600 2019-01-12 04:33:14.291 7fc167fa4700 10 -- v2:172.21.15.197:6806/11137 connect_to v2:172.21.15.12:6810/11067 existing 0x55bce92e7600
/a/sage-2019-01-11_13:08:30-rados-wip-sage-testing-2019-01-10-1950-distro-basic-smithi/3446989
Updated by Sage Weil about 5 years ago
- Status changed from 7 to Pending Backport
Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38241: mimic: msg/async: connection race + winner fault can leave connection in standby added
Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38242: luminous: msg/async: connection race + winner fault can leave connection in standby added
Updated by Greg Farnum about 5 years ago
- Project changed from RADOS to Messengers
Updated by Nathan Cutler about 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Actions