Bug #40115
handle_read_frame_preamble_main crc mismatch for main preamble rx_crc=2646747243 tx_crc=2809446983
% Done:
0%
Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Description
symptom is MON_DOWN in cluster log, due to the mon reconnects failing, and thus missing an election cycle.
seems to be non-deterministic.
/a/sage-2019-05-30_18:02:57-rados-wip-sage-testing-2019-05-30-0853-distro-basic-smithi/3988780
normal exchange, tehn a (normal) fault
2019-05-30T21:17:25.751+0000 7fc72c684700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080).handle_read_frame_dispatch tag=20 2019-05-30T21:17:25.751+0000 7fc72c684700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080).handle_message_ack payload.length()=8 2019-05-30T21:17:25.751+0000 7fc72c684700 15 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080).handle_message_ack seq=6699 2019-05-30T21:17:25.751+0000 7fc72c684700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080).handle_message_ack got ack seq 6699 >= 6699 on 0x55fdf81a5700 paxos(begin lc 1734 fc 0 pn 300 opn 0) v4 2019-05-30T21:17:25.751+0000 7fc72c684700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080).read_frame 2019-05-30T21:17:25.751+0000 7fc72c684700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=32 2019-05-30T21:17:25.751+0000 7fc72c684700 1 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 39 2019-05-30T21:17:25.751+0000 7fc72c684700 1 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed 2019-05-30T21:17:25.751+0000 7fc72c684700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080).handle_read_frame_preamble_main r=-1 2019-05-30T21:17:25.751+0000 7fc72c684700 1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted) 2019-05-30T21:17:25.751+0000 7fc72c684700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080)._fault 2019-05-30T21:17:25.751+0000 7fc72c684700 1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 unknown :-1 s=READY pgs=750 cs=59 l=0 rx=0 tx=0)._fault with nothing to send, going to standby
but then reconnect (eventually) fails
2019-05-30T21:17:25.755+0000 7fc72ce85700 1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf7942880 0x55fdf7723600 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=0 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).handle_reconnect reconnect to existing=0x55fdf6aff200 2019-05-30T21:17:25.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf7942880 0x55fdf7723600 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=0 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).reuse_connection existing=0x55fdf6aff200 reconnect=1 2019-05-30T21:17:25.759+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).send_reconnect_ok 2019-05-30T21:17:25.759+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).discard_requeued_up_to 6699 2019-05-30T21:17:25.759+0000 7fc72ce85700 5 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).send_reconnect_ok sending reconnect_ok: msg_seq=4959 2019-05-30T21:17:25.759+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] accept_conn 0x55fdf6aff200 [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] 2019-05-30T21:17:25.759+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).inject_delay sleep for 0.002 2019-05-30T21:17:25.759+0000 7fc72f68a700 10 mon.a@0(leader) e1 ms_handle_accept con 0x55fdf6aff200 session 0x55fdf5b77840 already on list 2019-05-30T21:17:25.759+0000 7fc72ce85700 0 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send injecting socket failure 2019-05-30T21:17:25.759+0000 7fc72ce85700 1 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send send error: (32) Broken pipe 2019-05-30T21:17:25.759+0000 7fc72ce85700 1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).write reconnect ok write failed r=-32 ((32) Broken pipe) 2019-05-30T21:17:25.759+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._fault 2019-05-30T21:17:25.759+0000 7fc72ce85700 1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._fault with nothing to send, going to standby 2019-05-30T21:17:35.755+0000 7fc72f68a700 1 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] --> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] -- election(44134d2e-0111-46ee-b928-76392e7c2e49 propose rel 15 e11) v8 -- 0x55fdf7903200 con 0x55fdf6aff200 2019-05-30T21:17:35.755+0000 7fc72f68a700 5 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=STANDBY pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).send_message enqueueing message m=0x55fdf7903200 type=65 election(44134d2e-0111-46ee-b928-76392e7c2e49 propose rel 15 e11) v8 2019-05-30T21:17:35.755+0000 7fc72f68a700 15 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=STANDBY pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).send_message inline write is denied, reschedule m=0x55fdf7903200 2019-05-30T21:17:35.755+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).handle_write 2019-05-30T21:17:35.755+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=STANDBY pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).write_event 2019-05-30T21:17:35.755+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=STANDBY pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).write_event policy.server is false 2019-05-30T21:17:35.755+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._connect 2019-05-30T21:17:35.755+0000 7fc72ce85700 1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=STANDBY pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).connect 2019-05-30T21:17:35.755+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTING l=0).process 2019-05-30T21:17:35.755+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTING_RE l=0).process connect successfully, ready to send banner 2019-05-30T21:17:35.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=START_CONNECT pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).read_event 2019-05-30T21:17:35.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=START_CONNECT pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).start_client_banner_exchange 2019-05-30T21:17:35.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._banner_exchange 2019-05-30T21:17:35.755+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 26 remaining bytes 0 2019-05-30T21:17:35.755+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=10 2019-05-30T21:17:35.755+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).process 2019-05-30T21:17:35.755+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read continue len=10 2019-05-30T21:17:35.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._handle_peer_banner r=0 2019-05-30T21:17:35.755+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=16 2019-05-30T21:17:35.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._handle_peer_banner_payload r=0 2019-05-30T21:17:35.755+0000 7fc72ce85700 1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._handle_peer_banner_payload supported=0 required=0 2019-05-30T21:17:35.755+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 112 remaining bytes 0 2019-05-30T21:17:35.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).read_frame 2019-05-30T21:17:35.755+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=32 2019-05-30T21:17:35.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).handle_read_frame_preamble_main r=0 2019-05-30T21:17:35.755+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).handle_read_frame_preamble_main got encrypted preamble. after decrypt premable.length()=32 2019-05-30T21:17:35.755+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).handle_read_frame_preamble_main crc mismatch for main preamble rx_crc=2646747243 tx_crc=2809446983
and continues to fail after that,
2019-05-30T21:17:35.755+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._fault 2019-05-30T21:17:35.755+0000 7fc72ce85700 1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=START_CONNECT pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._fault waiting 0.200000 2019-05-30T21:17:35.959+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTING l=0).process 2019-05-30T21:17:35.959+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTING_RE l=0).process connect successfully, ready to send banner 2019-05-30T21:17:35.959+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=START_CONNECT pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).read_event 2019-05-30T21:17:35.959+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=START_CONNECT pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).start_client_banner_exchange 2019-05-30T21:17:35.959+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._banner_exchange 2019-05-30T21:17:35.959+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 26 remaining bytes 0 2019-05-30T21:17:35.959+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=10 2019-05-30T21:17:35.959+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._handle_peer_banner r=0 2019-05-30T21:17:35.959+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=16 2019-05-30T21:17:35.959+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._handle_peer_banner_payload r=0 2019-05-30T21:17:35.959+0000 7fc72ce85700 1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._handle_peer_banner_payload supported=0 required=0 2019-05-30T21:17:35.959+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 112 remaining bytes 0 2019-05-30T21:17:35.959+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).read_frame 2019-05-30T21:17:35.959+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=32 2019-05-30T21:17:35.959+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).handle_read_frame_preamble_main r=0 2019-05-30T21:17:35.959+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).handle_read_frame_preamble_main got encrypted preamble. after decrypt premable.length()=32 2019-05-30T21:17:35.959+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).handle_read_frame_preamble_main crc mismatch for main preamble rx_crc=504640588 tx_crc=3967004151 2019-05-30T21:17:35.959+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._fault
Related issues
History
#1 Updated by Sage Weil almost 5 years ago
hmm, on the other end, the fault is waiting 15s (!)
2019-05-30T21:17:21.787+0000 7f06712ee700 10 --2- [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] >> [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] conn(0x558abbfc7680 0x558abc027600 secure :-1 s=WAIT pgs=1710 cs=57 l=0 rx=0x558abee73860 tx=0x558abca40580)._fault 2019-05-30T21:17:21.787+0000 7f06712ee700 1 --2- [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] >> [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] conn(0x558abbfc7680 0x558abc027600 secure :-1 s=START_CONNECT pgs=1710 cs=58 l=0 rx=0x558abee73860 tx=0x558abca40580)._fault waiting 15.000000
maybe a timeout isn't getting reset?
#2 Updated by Sage Weil almost 5 years ago
- Description updated (diff)
#3 Updated by Sage Weil almost 5 years ago
- Description updated (diff)
#4 Updated by Sage Weil almost 5 years ago
void ProtocolV2::reset_recv_state() { if ((state >= AUTH_CONNECTING && state <= SESSION_RECONNECTING) || state == READY) { auth_meta.reset(new AuthConnectionMeta); session_stream_handlers.tx.reset(nullptr); session_stream_handlers.rx.reset(nullptr); pre_auth.txbuf.clear(); pre_auth.rxbuf.clear(); }
shouldn't the stream handlers get reset unconditionally on reset? because when we reconnect, we will do our auth exchange in frames that aren't encrypted with the previous connection's connection_secret...?
#5 Updated by Radoslaw Zarzynski almost 5 years ago
shouldn't the stream handlers get reset unconditionally on reset?
Just making the reset unconditional will brake the (nasty) logic of ::reuse_connection() where we call ::stop() (and thus ::reset_recv_state()) before reassigning e.g. auth_meta:
CtPtr ProtocolV2::reuse_connection(const AsyncConnectionRef& existing, ProtocolV2 *exproto) { // ... ldout(messenger->cct, 5) << __func__ << " stop myself to swap existing" << dendl; // avoid _stop shutdown replacing socket // queue a reset on the new connection, which we're dumping for the old stop(); connection->dispatch_queue->queue_reset(connection); exproto->can_write = false; exproto->reconnecting = reconnecting; exproto->replacing = true; std::swap(exproto->session_stream_handlers, session_stream_handlers); exproto->auth_meta = auth_meta; existing->state_offset = 0; // ... }
I think we'll need a bit broader rework.
#6 Updated by Sage Weil almost 5 years ago
- Project changed from RADOS to Messengers
#7 Updated by Radoslaw Zarzynski almost 5 years ago
- Status changed from 12 to 17
#8 Updated by Kefu Chai almost 5 years ago
- Assignee set to Radoslaw Zarzynski
- Pull request ID set to 28453
#9 Updated by Sage Weil almost 5 years ago
- Status changed from 17 to Pending Backport
- Backport set to nautilus
#10 Updated by Nathan Cutler almost 5 years ago
- Copied to Backport #40441: nautilus: handle_read_frame_preamble_main crc mismatch for main preamble rx_crc=2646747243 tx_crc=2809446983 added
#11 Updated by Nathan Cutler over 4 years ago
- Status changed from Pending Backport to Resolved