Project

General

Profile

Bug #40115

handle_read_frame_preamble_main crc mismatch for main preamble rx_crc=2646747243 tx_crc=2809446983

Added by Sage Weil 4 months ago. Updated 24 days ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
Start date:
06/03/2019
Due date:
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

symptom is MON_DOWN in cluster log, due to the mon reconnects failing, and thus missing an election cycle.

seems to be non-deterministic.

/a/sage-2019-05-30_18:02:57-rados-wip-sage-testing-2019-05-30-0853-distro-basic-smithi/3988780

normal exchange, tehn a (normal) fault

2019-05-30T21:17:25.751+0000 7fc72c684700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080).handle_read_frame_dispatch tag=20
2019-05-30T21:17:25.751+0000 7fc72c684700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080).handle_message_ack payload.length()=8
2019-05-30T21:17:25.751+0000 7fc72c684700 15 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080).handle_message_ack seq=6699
2019-05-30T21:17:25.751+0000 7fc72c684700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080).handle_message_ack got ack seq 6699 >= 6699 on 0x55fdf81a5700 paxos(begin lc 1734 fc 0 pn 300 opn 0) v4
2019-05-30T21:17:25.751+0000 7fc72c684700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080).read_frame
2019-05-30T21:17:25.751+0000 7fc72c684700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=32
2019-05-30T21:17:25.751+0000 7fc72c684700  1 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 39
2019-05-30T21:17:25.751+0000 7fc72c684700  1 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed
2019-05-30T21:17:25.751+0000 7fc72c684700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080).handle_read_frame_preamble_main r=-1
2019-05-30T21:17:25.751+0000 7fc72c684700  1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
2019-05-30T21:17:25.751+0000 7fc72c684700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=READY pgs=750 cs=59 l=0 rx=0x55fdf8becc00 tx=0x55fdf8a2c080)._fault
2019-05-30T21:17:25.751+0000 7fc72c684700  1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 unknown :-1 s=READY pgs=750 cs=59 l=0 rx=0 tx=0)._fault with nothing to send, going to standby

but then reconnect (eventually) fails
2019-05-30T21:17:25.755+0000 7fc72ce85700  1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf7942880 0x55fdf7723600 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=0 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).handle_reconnect reconnect to existing=0x55fdf6aff200
2019-05-30T21:17:25.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf7942880 0x55fdf7723600 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=0 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).reuse_connection existing=0x55fdf6aff200 reconnect=1
2019-05-30T21:17:25.759+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).send_reconnect_ok
2019-05-30T21:17:25.759+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).discard_requeued_up_to 6699
2019-05-30T21:17:25.759+0000 7fc72ce85700  5 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).send_reconnect_ok sending reconnect_ok: msg_seq=4959
2019-05-30T21:17:25.759+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] accept_conn 0x55fdf6aff200 [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0]
2019-05-30T21:17:25.759+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).inject_delay sleep for 0.002
2019-05-30T21:17:25.759+0000 7fc72f68a700 10 mon.a@0(leader) e1 ms_handle_accept con 0x55fdf6aff200 session 0x55fdf5b77840 already on list
2019-05-30T21:17:25.759+0000 7fc72ce85700  0 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send injecting socket failure
2019-05-30T21:17:25.759+0000 7fc72ce85700  1 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send send error: (32) Broken pipe
2019-05-30T21:17:25.759+0000 7fc72ce85700  1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).write reconnect ok write failed r=-32 ((32) Broken pipe)
2019-05-30T21:17:25.759+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._fault
2019-05-30T21:17:25.759+0000 7fc72ce85700  1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=SESSION_ACCEPTING pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._fault with nothing to send, going to standby
2019-05-30T21:17:35.755+0000 7fc72f68a700  1 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] --> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] -- election(44134d2e-0111-46ee-b928-76392e7c2e49 propose rel 15 e11) v8 -- 0x55fdf7903200 con 0x55fdf6aff200
2019-05-30T21:17:35.755+0000 7fc72f68a700  5 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=STANDBY pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).send_message enqueueing message m=0x55fdf7903200 type=65 election(44134d2e-0111-46ee-b928-76392e7c2e49 propose rel 15 e11) v8
2019-05-30T21:17:35.755+0000 7fc72f68a700 15 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=STANDBY pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).send_message inline write is denied, reschedule m=0x55fdf7903200
2019-05-30T21:17:35.755+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).handle_write
2019-05-30T21:17:35.755+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=STANDBY pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).write_event
2019-05-30T21:17:35.755+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=STANDBY pgs=803 cs=60 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).write_event policy.server is false
2019-05-30T21:17:35.755+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._connect
2019-05-30T21:17:35.755+0000 7fc72ce85700  1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=STANDBY pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).connect
2019-05-30T21:17:35.755+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTING l=0).process
2019-05-30T21:17:35.755+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTING_RE l=0).process connect successfully, ready to send banner
2019-05-30T21:17:35.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=START_CONNECT pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).read_event
2019-05-30T21:17:35.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=START_CONNECT pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).start_client_banner_exchange
2019-05-30T21:17:35.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._banner_exchange
2019-05-30T21:17:35.755+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 26 remaining bytes 0
2019-05-30T21:17:35.755+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=10
2019-05-30T21:17:35.755+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).process
2019-05-30T21:17:35.755+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read continue len=10
2019-05-30T21:17:35.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._handle_peer_banner r=0
2019-05-30T21:17:35.755+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=16
2019-05-30T21:17:35.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._handle_peer_banner_payload r=0
2019-05-30T21:17:35.755+0000 7fc72ce85700  1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._handle_peer_banner_payload supported=0 required=0
2019-05-30T21:17:35.755+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 112 remaining bytes 0
2019-05-30T21:17:35.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).read_frame
2019-05-30T21:17:35.755+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=32
2019-05-30T21:17:35.755+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).handle_read_frame_preamble_main r=0
2019-05-30T21:17:35.755+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).handle_read_frame_preamble_main got encrypted preamble. after decrypt premable.length()=32
2019-05-30T21:17:35.755+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).handle_read_frame_preamble_main crc mismatch for main preamble rx_crc=2646747243 tx_crc=2809446983

and continues to fail after that,
2019-05-30T21:17:35.755+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=61 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._fault
2019-05-30T21:17:35.755+0000 7fc72ce85700  1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=START_CONNECT pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._fault waiting 0.200000
2019-05-30T21:17:35.959+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTING l=0).process
2019-05-30T21:17:35.959+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTING_RE l=0).process connect successfully, ready to send banner
2019-05-30T21:17:35.959+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=START_CONNECT pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).read_event
2019-05-30T21:17:35.959+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=START_CONNECT pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).start_client_banner_exchange
2019-05-30T21:17:35.959+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._banner_exchange
2019-05-30T21:17:35.959+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 26 remaining bytes 0
2019-05-30T21:17:35.959+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=10
2019-05-30T21:17:35.959+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._handle_peer_banner r=0
2019-05-30T21:17:35.959+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=16
2019-05-30T21:17:35.959+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._handle_peer_banner_payload r=0
2019-05-30T21:17:35.959+0000 7fc72ce85700  1 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=BANNER_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._handle_peer_banner_payload supported=0 required=0
2019-05-30T21:17:35.959+0000 7fc72ce85700 10 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 112 remaining bytes 0
2019-05-30T21:17:35.959+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).read_frame
2019-05-30T21:17:35.959+0000 7fc72ce85700 20 -- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 msgr2=0x55fdf6a4bb80 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=32
2019-05-30T21:17:35.959+0000 7fc72ce85700 20 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).handle_read_frame_preamble_main r=0
2019-05-30T21:17:35.959+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).handle_read_frame_preamble_main got encrypted preamble. after decrypt premable.length()=32
2019-05-30T21:17:35.959+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00).handle_read_frame_preamble_main crc mismatch for main preamble rx_crc=504640588 tx_crc=3967004151
2019-05-30T21:17:35.959+0000 7fc72ce85700 10 --2- [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] >> [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] conn(0x55fdf6aff200 0x55fdf6a4bb80 secure :-1 s=HELLO_CONNECTING pgs=803 cs=62 l=0 rx=0x55fdf7470270 tx=0x55fdf7ec9f00)._fault


Related issues

Copied to Messengers - Backport #40441: nautilus: handle_read_frame_preamble_main crc mismatch for main preamble rx_crc=2646747243 tx_crc=2809446983 Resolved

History

#1 Updated by Sage Weil 4 months ago

hmm, on the other end, the fault is waiting 15s (!)

2019-05-30T21:17:21.787+0000 7f06712ee700 10 --2- [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] >> [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] conn(0x558abbfc7680 0x558abc027600 secure :-1 s=WAIT pgs=1710 cs=57 l=0 rx=0x558abee73860 tx=0x558abca40580)._fault
2019-05-30T21:17:21.787+0000 7f06712ee700  1 --2- [v2:172.21.15.23:3301/0,v1:172.21.15.23:6790/0] >> [v2:172.21.15.23:3300/0,v1:172.21.15.23:6789/0] conn(0x558abbfc7680 0x558abc027600 secure :-1 s=START_CONNECT pgs=1710 cs=58 l=0 rx=0x558abee73860 tx=0x558abca40580)._fault waiting 15.000000

maybe a timeout isn't getting reset?

#2 Updated by Sage Weil 4 months ago

  • Description updated (diff)

#3 Updated by Sage Weil 4 months ago

  • Description updated (diff)

#4 Updated by Sage Weil 4 months ago

void ProtocolV2::reset_recv_state() {
  if ((state >= AUTH_CONNECTING && state <= SESSION_RECONNECTING) ||
      state == READY) {
    auth_meta.reset(new AuthConnectionMeta);
    session_stream_handlers.tx.reset(nullptr);
    session_stream_handlers.rx.reset(nullptr);
    pre_auth.txbuf.clear();
    pre_auth.rxbuf.clear();
  }

shouldn't the stream handlers get reset unconditionally on reset? because when we reconnect, we will do our auth exchange in frames that aren't encrypted with the previous connection's connection_secret...?

#5 Updated by Radoslaw Zarzynski 4 months ago

shouldn't the stream handlers get reset unconditionally on reset?

Just making the reset unconditional will brake the (nasty) logic of ::reuse_connection() where we call ::stop() (and thus ::reset_recv_state()) before reassigning e.g. auth_meta:

CtPtr ProtocolV2::reuse_connection(const AsyncConnectionRef& existing,
                                   ProtocolV2 *exproto) {
  // ...
  ldout(messenger->cct, 5) << __func__ << " stop myself to swap existing" 
                           << dendl;
  // avoid _stop shutdown replacing socket
  // queue a reset on the new connection, which we're dumping for the old
  stop();

  connection->dispatch_queue->queue_reset(connection);

  exproto->can_write = false;
  exproto->reconnecting = reconnecting;
  exproto->replacing = true;
  std::swap(exproto->session_stream_handlers, session_stream_handlers);
  exproto->auth_meta = auth_meta;
  existing->state_offset = 0;

  // ...
}

I think we'll need a bit broader rework.

#6 Updated by Sage Weil 4 months ago

  • Project changed from RADOS to Messengers

#7 Updated by Radoslaw Zarzynski 3 months ago

  • Status changed from Verified to Need Test

#8 Updated by Kefu Chai 3 months ago

  • Assignee set to Radoslaw Zarzynski
  • Pull request ID set to 28453

#9 Updated by Sage Weil 3 months ago

  • Status changed from Need Test to Pending Backport
  • Backport set to nautilus

#10 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #40441: nautilus: handle_read_frame_preamble_main crc mismatch for main preamble rx_crc=2646747243 tx_crc=2809446983 added

#11 Updated by Nathan Cutler 24 days ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF