Bug #36612
closedmsg/async: connection stall
Added by Sage Weil over 5 years ago. Updated over 4 years ago.
0%
Description
2018-10-28 23:23:07.173 7ff73e5b8700 1 -- 172.21.15.118:6813/34020 <== osd.0 172.21.15.118:6801/97454 124 ==== rep_scrubmap(1.c e310 from shard 0) v2 ==== 40+0+42 (2203346794 0 3641534724) 0x561093d72d80 con 0x561092add800 2018-10-28 23:23:07.173 7ff71e4f0700 20 osd.3 969 share_map osd.0 172.21.15.118:6801/97454 310 2018-10-28 23:23:07.173 7ff71e4f0700 20 osd.3 969 should_share_map osd.0 172.21.15.118:6801/97454 310 2018-10-28 23:23:07.173 7ff7224f8700 1 -- 172.21.15.118:6813/34020 --> 172.21.15.118:6801/97454 -- MOSDScrubReserve(1.c RELEASE e969) v1 -- 0x56109353aa00 con 0 2018-10-28 23:23:07.173 7ff7224f8700 1 -- 172.21.15.118:6813/34020 --> 172.21.15.118:6801/97454 -- pg_info((query:969 sent:969 1.c( empty local-lis/les=310/311 n=0 ec=282/14 lis/c 310/310 les/c/f 311/311/0 310/310/14))=([0,0] intervals=) epoch 969) v5 -- 0x561092990f00 con 0 2018-10-28 23:23:54.655 7ff73e5b8700 1 -- 172.21.15.118:6813/34020 >> 172.21.15.118:6801/97454 conn(0x561092add800 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 46 2018-10-28 23:23:54.655 7ff73e5b8700 1 -- 172.21.15.118:6813/34020 >> 172.21.15.118:6801/97454 conn(0x561092add800 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed 2018-10-28 23:23:54.655 7ff73e5b8700 1 -- 172.21.15.118:6813/34020 >> 172.21.15.118:6801/97454 conn(0x561092add800 legacy :-1 s=OPENED pgs=3 cs=1 l=0).handle_message read tag failed
on other end,
2018-10-28 23:23:07.173 7fdedfddc700 1 -- 172.21.15.118:6801/97454 --> 172.21.15.118:6813/34020 -- rep_scrubmap(1.c e310 from shard 0) v2 -- 0x55c5621bb200 con 0 2018-10-28 23:23:07.174 7fdf03eac700 1 -- 172.21.15.118:6801/97454 <== osd.3 172.21.15.118:6813/34020 225 ==== MOSDScrubReserve(1.c RELEASE e969) v1 ==== 43+0+0 (1924339382 0 0) 0x55c55ea32000 con 0x55c55e29b000 2018-10-28 23:23:07.174 7fdf03eac700 1 -- 172.21.15.118:6801/97454 <== osd.3 172.21.15.118:6813/34020 226 ==== pg_info((query:969 sent:969 1.c( empty local-lis/les=310/311 n=0 ec=282/14 lis/c 310/310 les/c/f 311/311/0 310/310/14))=([0,0] intervals=) epoch 969) v5 ==== 961+0+0 (630120914 0 0) 0x55c56338b860 con 0x55c55e29b000 2018-10-28 23:23:07.174 7fdee3de4700 20 osd.0 969 share_map osd.3 172.21.15.118:6813/34020 969 2018-10-28 23:23:07.174 7fdee3de4700 20 osd.0 969 should_share_map osd.3 172.21.15.118:6813/34020 969
/a/sage-2018-10-28_18:53:56-rados-wip-sage2-testing-2018-10-28-0942-distro-basic-smithi/3197235
Updated by Sage Weil over 5 years ago
- Priority changed from Urgent to Immediate
I caught this in a run with debug ms = 20
/a/sage-2018-11-29_15:56:38-rados-wip-sage2-testing-2018-11-28-1118-distro-basic-smithi/3293532
osd.2 sending pg_query to osd.7
osd.2,
2018-11-29 19:04:11.852 7f8fa7eab700 10 -- 172.21.15.122:6813/12771 create_connect 172.21.15.96:6810/15937, creating connection and registering 2018-11-29 19:04:11.852 7f8fa7eab700 10 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=STATE_NONE l=0)._connect 2018-11-29 19:04:11.852 7f8fa7eab700 10 -- 172.21.15.122:6813/12771 connect_to 172.21.15.96:6810/15937 new 0xd6c2400 2018-11-29 19:04:11.852 7f8fa7eab700 1 -- 172.21.15.122:6813/12771 --> 172.21.15.96:6810/15937 -- pg_query(1.10 epoch 87) v4 -- 0xd2ee200 con 0 2018-11-29 19:04:11.852 7f8fa7eab700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=START_CONNECT pgs=0 cs=0 l=0).prepare_send_message m pg_query(1.10 epoch 87) v4 ... 2018-11-29 19:04:11.852 7f8fa56a6700 10 -- 172.21.15.122:6813/12771 connect_to 172.21.15.96:6810/15937 existing 0xd6c2400 2018-11-29 19:04:11.852 7f8fa56a6700 1 -- 172.21.15.122:6813/12771 --> 172.21.15.96:6810/15937 -- pg_query(1.d epoch 87) v4 -- 0xd25a800 con 0 2018-11-29 19:04:11.852 7f8fa56a6700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=0).prepare_send_message m pg_query(1.d epoch 87) v4 ... 2018-11-29 19:04:11.876 7f8fc9824700 10 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).send_connect_message connect sending gseq=29 cseq=0 proto=10 2018-11-29 19:04:11.876 7f8fc9824700 0 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send injecting socket failure 2018-11-29 19:04:11.876 7f8fc9824700 1 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send send error: (32) Broken pipe 2018-11-29 19:04:11.876 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_message_write r=-32 2018-11-29 19:04:11.876 7f8fc9824700 2 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_message_write connect couldn't send reply (32) Broken pipe 2018-11-29 19:04:11.876 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).fault 2018-11-29 19:04:11.876 7f8fc9824700 0 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).fault initiating reconnect ... 2018-11-29 19:04:11.880 7f8fc9824700 10 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).send_connect_message connect sending gseq=29 cseq=1 proto=10 ... 2018-11-29 19:04:11.880 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_1 connect got reply tag 2 connect_seq 0 global_seq 0 proto 10 flags 0 features 4611087854031142911 2018-11-29 19:04:11.880 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).wait_connect_reply_auth 2018-11-29 19:04:11.880 7f8fc9824700 10 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).wait_connect_reply_auth reply.authorizer_len=36 2018-11-29 19:04:11.880 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=36 2018-11-29 19:04:11.880 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_auth r=0 2018-11-29 19:04:11.880 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_2 2018-11-29 19:04:11.880 7f8fc9824700 0 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_2 connect got RESETSESSION 2018-11-29 19:04:11.880 7f8fc9824700 10 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).session_reset started 2018-11-29 19:04:11.880 7f8fc9824700 10 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).discard_out_queue started 2018-11-29 19:04:11.880 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).discard_out_queue discard 0xd2ee200 2018-11-29 19:04:11.880 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).discard_out_queue discard 0xd25a800
so, the reconnect make cseq=2, which made osd.7 RESETSESSION.
on osd.7 side,
2018-11-29 19:04:11.875 7efc63325700 10 -- 172.21.15.96:6810/15937 >> - conn(0xcc40400 legacy :6810 s=ACCEPTING pgs=0 cs=0 l=0).handle_client_banner accept peer addr is 172.21.15.122:6813/12771 2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING pgs=0 cs=0 l=0).wait_connect_message 2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=33 2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING pgs=0 cs=0 l=0).handle_connect_message_1 r=0 2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message_auth 2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=174 2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_auth r=0 2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept got peer connect_seq 0 global_seq 29 2018-11-29 19:04:11.875 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept of host_type 4, policy.lossy=0 policy.server=0 policy.standby=1 policy.resetcheck=0 2018-11-29 19:04:11.875 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept my proto 10, their proto 10 2018-11-29 19:04:11.875 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 authorizor_protocol 2 len 174 2018-11-29 19:04:11.875 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2: challenging authorizer 2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).send_connect_message_reply 2018-11-29 19:04:11.875 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 58 remaining bytes 0 2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_reply_write r=0 2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message 2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=33 2018-11-29 19:04:11.879 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).process 2018-11-29 19:04:11.879 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read continue len=33 2018-11-29 19:04:11.879 7efc63325700 1 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 137 2018-11-29 19:04:11.879 7efc63325700 1 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed 2018-11-29 19:04:11.879 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_1 r=-1 2018-11-29 19:04:11.879 7efc63325700 1 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_1 read connect msg failed 2018-11-29 19:04:11.879 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).fault 2018-11-29 19:04:11.879 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).fault with nothing to send and in the half accept state just closed 2018-11-29 19:04:11.879 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).stop 2018-11-29 19:04:11.879 7efc63325700 2 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).stop 2018-11-29 19:04:11.879 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).discard_out_queue started 2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> - conn(0xcc62480 legacy :6810 s=ACCEPTING pgs=0 cs=0 l=0).handle_client_banner accept peer addr is 172.21.15.122:6813/12771 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING pgs=0 cs=0 l=0).wait_connect_message 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=33 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING pgs=0 cs=0 l=0).handle_connect_message_1 r=0 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message_auth 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=174 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_auth r=0 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept got peer connect_seq 1 global_seq 29 2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept of host_type 4, policy.lossy=0 policy.server=0 policy.standby=1 policy.resetcheck=0 2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept my proto 10, their proto 10 2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 authorizor_protocol 2 len 174 2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2: challenging authorizer 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).send_connect_message_reply 2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 58 remaining bytes 0 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_reply_write r=0 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=33 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).process 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read continue len=33 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_1 r=0 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message_auth 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=174 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_auth r=0 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept got peer connect_seq 1 global_seq 29 2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept of host_type 4, policy.lossy=0 policy.server=0 policy.standby=1 policy.resetcheck=0 2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept my proto 10, their proto 10 2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 authorizor_protocol 2 len 174 2018-11-29 19:04:11.883 7efc63325700 10 osd.7 87 ms_handle_authentication new session 0xcc27900 con 0xcc62480 entity osd.2 addr 172.21.15.122:6813/12771 2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept setting up session_security. 2018-11-29 19:04:11.883 7efc63325700 0 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept we reset (peer sent cseq 1), sending RESETSESSION
Updated by Sage Weil over 5 years ago
- Status changed from 12 to 7
- Backport set to mimic,luminous
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #37520: mimic: msg/async: connection stall added
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #37521: luminous: msg/async: connection stall added
Updated by Sage Weil over 5 years ago
- Related to Bug #21143: bad RESETSESSION between OSDs? added
Updated by Sage Weil over 5 years ago
- Has duplicate Bug #36666: msg: rejoin message queued but not sent added
Updated by Greg Farnum about 5 years ago
- Project changed from RADOS to Messengers
Updated by Greg Farnum over 4 years ago
- Has duplicate Bug #42058: OSD reconnected across map epochs, inconsistent pg logs created added
Updated by Nathan Cutler over 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Updated by Nathan Cutler over 4 years ago
backports were not needed, because the bug was introduced during post-mimic refactoring of msgr V1 - see https://github.com/ceph/ceph/pull/30673#issuecomment-543659834
"The backport is not needed in luminous, nor in mimic. The problem was caused with the reduction of states of the V1 protocol during its refactoring."
Updated by Patrick Donnelly over 4 years ago
- Has duplicate Bug #36507: client: connection failure during reconnect causes client to hang added