Project

General

Profile

Actions

Bug #36612

closed

msg/async: connection stall

Added by Sage Weil over 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
AsyncMessenger
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-10-28 23:23:07.173 7ff73e5b8700  1 -- 172.21.15.118:6813/34020 <== osd.0 172.21.15.118:6801/97454 124 ==== rep_scrubmap(1.c e310 from shard 0) v2 ==== 40+0+42 (2203346794 0 3641534724) 0x561093d72d80 con 0x561092add800
2018-10-28 23:23:07.173 7ff71e4f0700 20 osd.3 969 share_map osd.0 172.21.15.118:6801/97454 310
2018-10-28 23:23:07.173 7ff71e4f0700 20 osd.3 969 should_share_map osd.0 172.21.15.118:6801/97454 310
2018-10-28 23:23:07.173 7ff7224f8700  1 -- 172.21.15.118:6813/34020 --> 172.21.15.118:6801/97454 -- MOSDScrubReserve(1.c RELEASE e969) v1 -- 0x56109353aa00 con 0
2018-10-28 23:23:07.173 7ff7224f8700  1 -- 172.21.15.118:6813/34020 --> 172.21.15.118:6801/97454 -- pg_info((query:969 sent:969 1.c( empty local-lis/les=310/311 n=0 ec=282/14 lis/c 310/310 les/c/f 311/311/0 310/310/14))=([0,0] intervals=) epoch 969) v5 -- 0x561092990f00 con 0
2018-10-28 23:23:54.655 7ff73e5b8700  1 -- 172.21.15.118:6813/34020 >> 172.21.15.118:6801/97454 conn(0x561092add800 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 46
2018-10-28 23:23:54.655 7ff73e5b8700  1 -- 172.21.15.118:6813/34020 >> 172.21.15.118:6801/97454 conn(0x561092add800 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed
2018-10-28 23:23:54.655 7ff73e5b8700  1 -- 172.21.15.118:6813/34020 >> 172.21.15.118:6801/97454 conn(0x561092add800 legacy :-1 s=OPENED pgs=3 cs=1 l=0).handle_message read tag failed

on other end,
2018-10-28 23:23:07.173 7fdedfddc700  1 -- 172.21.15.118:6801/97454 --> 172.21.15.118:6813/34020 -- rep_scrubmap(1.c e310 from shard 0) v2 -- 0x55c5621bb200 con 0
2018-10-28 23:23:07.174 7fdf03eac700  1 -- 172.21.15.118:6801/97454 <== osd.3 172.21.15.118:6813/34020 225 ==== MOSDScrubReserve(1.c RELEASE e969) v1 ==== 43+0+0 (1924339382 0 0) 0x55c55ea32000 con 0x55c55e29b000
2018-10-28 23:23:07.174 7fdf03eac700  1 -- 172.21.15.118:6801/97454 <== osd.3 172.21.15.118:6813/34020 226 ==== pg_info((query:969 sent:969 1.c( empty local-lis/les=310/311 n=0 ec=282/14 lis/c 310/310 les/c/f 311/311/0 310/310/14))=([0,0] intervals=) epoch 969) v5 ==== 961+0+0 (630120914 0 0) 0x55c56338b860 con 0x55c55e29b000
2018-10-28 23:23:07.174 7fdee3de4700 20 osd.0 969 share_map osd.3 172.21.15.118:6813/34020 969
2018-10-28 23:23:07.174 7fdee3de4700 20 osd.0 969 should_share_map osd.3 172.21.15.118:6813/34020 969

/a/sage-2018-10-28_18:53:56-rados-wip-sage2-testing-2018-10-28-0942-distro-basic-smithi/3197235

Related issues 6 (0 open6 closed)

Related to RADOS - Bug #21143: bad RESETSESSION between OSDs?DuplicateHaomai Wang08/26/2017

Actions
Has duplicate Messengers - Bug #36666: msg: rejoin message queued but not sentDuplicate

Actions
Has duplicate RADOS - Bug #42058: OSD reconnected across map epochs, inconsistent pg logs createdDuplicate09/26/201910/31/2019

Actions
Has duplicate CephFS - Bug #36507: client: connection failure during reconnect causes client to hangDuplicatePatrick Donnelly

Actions
Copied to Messengers - Backport #37520: mimic: msg/async: connection stallRejectedNathan CutlerActions
Copied to Messengers - Backport #37521: luminous: msg/async: connection stallRejectedNathan CutlerActions
Actions #1

Updated by Sage Weil over 5 years ago

  • Priority changed from Urgent to Immediate

I caught this in a run with debug ms = 20

/a/sage-2018-11-29_15:56:38-rados-wip-sage2-testing-2018-11-28-1118-distro-basic-smithi/3293532

osd.2 sending pg_query to osd.7

osd.2,

2018-11-29 19:04:11.852 7f8fa7eab700 10 -- 172.21.15.122:6813/12771 create_connect 172.21.15.96:6810/15937, creating connection and registering
2018-11-29 19:04:11.852 7f8fa7eab700 10 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=STATE_NONE l=0)._connect
2018-11-29 19:04:11.852 7f8fa7eab700 10 -- 172.21.15.122:6813/12771 connect_to 172.21.15.96:6810/15937 new 0xd6c2400
2018-11-29 19:04:11.852 7f8fa7eab700  1 -- 172.21.15.122:6813/12771 --> 172.21.15.96:6810/15937 -- pg_query(1.10 epoch 87) v4 -- 0xd2ee200 con 0
2018-11-29 19:04:11.852 7f8fa7eab700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=START_CONNECT pgs=0 cs=0 l=0).prepare_send_message m pg_query(1.10 epoch 87) v4
...
2018-11-29 19:04:11.852 7f8fa56a6700 10 -- 172.21.15.122:6813/12771 connect_to 172.21.15.96:6810/15937 existing 0xd6c2400
2018-11-29 19:04:11.852 7f8fa56a6700  1 -- 172.21.15.122:6813/12771 --> 172.21.15.96:6810/15937 -- pg_query(1.d epoch 87) v4 -- 0xd25a800 con 0
2018-11-29 19:04:11.852 7f8fa56a6700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=0).prepare_send_message m pg_query(1.d epoch 87) v4
...
2018-11-29 19:04:11.876 7f8fc9824700 10 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).send_connect_message connect sending gseq=29 cseq=0 proto=10
2018-11-29 19:04:11.876 7f8fc9824700  0 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send injecting socket failure
2018-11-29 19:04:11.876 7f8fc9824700  1 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send send error: (32) Broken pipe
2018-11-29 19:04:11.876 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_message_write r=-32
2018-11-29 19:04:11.876 7f8fc9824700  2 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_message_write connect couldn't send reply (32) Broken pipe
2018-11-29 19:04:11.876 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).fault
2018-11-29 19:04:11.876 7f8fc9824700  0 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).fault initiating reconnect
...
2018-11-29 19:04:11.880 7f8fc9824700 10 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).send_connect_message connect sending gseq=29 cseq=1 proto=10
...
2018-11-29 19:04:11.880 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_1 connect got reply tag 2 connect_seq 0 global_seq 0 proto 10 flags 0 features 4611087854031142911
2018-11-29 19:04:11.880 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).wait_connect_reply_auth
2018-11-29 19:04:11.880 7f8fc9824700 10 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).wait_connect_reply_auth reply.authorizer_len=36
2018-11-29 19:04:11.880 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=36
2018-11-29 19:04:11.880 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_auth r=0
2018-11-29 19:04:11.880 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_2
2018-11-29 19:04:11.880 7f8fc9824700  0 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_2 connect got RESETSESSION
2018-11-29 19:04:11.880 7f8fc9824700 10 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).session_reset started
2018-11-29 19:04:11.880 7f8fc9824700 10 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).discard_out_queue started
2018-11-29 19:04:11.880 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).discard_out_queue discard 0xd2ee200
2018-11-29 19:04:11.880 7f8fc9824700 20 -- 172.21.15.122:6813/12771 >> 172.21.15.96:6810/15937 conn(0xd6c2400 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).discard_out_queue discard 0xd25a800

so, the reconnect make cseq=2, which made osd.7 RESETSESSION.

on osd.7 side,

2018-11-29 19:04:11.875 7efc63325700 10 -- 172.21.15.96:6810/15937 >> - conn(0xcc40400 legacy :6810 s=ACCEPTING pgs=0 cs=0 l=0).handle_client_banner accept peer addr is 172.21.15.122:6813/12771
2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING pgs=0 cs=0 l=0).wait_connect_message
2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=33
2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING pgs=0 cs=0 l=0).handle_connect_message_1 r=0
2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message_auth
2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=174
2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_auth r=0
2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2
2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept got peer connect_seq 0 global_seq 29
2018-11-29 19:04:11.875 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept of host_type 4, policy.lossy=0 policy.server=0 policy.standby=1 policy.resetcheck=0
2018-11-29 19:04:11.875 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept my proto 10, their proto 10
2018-11-29 19:04:11.875 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 authorizor_protocol 2 len 174
2018-11-29 19:04:11.875 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2: challenging authorizer
2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).send_connect_message_reply
2018-11-29 19:04:11.875 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 58 remaining bytes 0
2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_reply_write r=0
2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message
2018-11-29 19:04:11.875 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=33
2018-11-29 19:04:11.879 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).process
2018-11-29 19:04:11.879 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read continue len=33
2018-11-29 19:04:11.879 7efc63325700  1 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 137
2018-11-29 19:04:11.879 7efc63325700  1 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed
2018-11-29 19:04:11.879 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_1 r=-1
2018-11-29 19:04:11.879 7efc63325700  1 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_1 read connect msg failed
2018-11-29 19:04:11.879 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).fault
2018-11-29 19:04:11.879 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).fault with nothing to send and in the half  accept state just closed
2018-11-29 19:04:11.879 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).stop
2018-11-29 19:04:11.879 7efc63325700  2 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).stop
2018-11-29 19:04:11.879 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc40400 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).discard_out_queue started
2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> - conn(0xcc62480 legacy :6810 s=ACCEPTING pgs=0 cs=0 l=0).handle_client_banner accept peer addr is 172.21.15.122:6813/12771
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING pgs=0 cs=0 l=0).wait_connect_message
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=33
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING pgs=0 cs=0 l=0).handle_connect_message_1 r=0
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message_auth
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=174
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_auth r=0
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept got peer connect_seq 1 global_seq 29
2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept of host_type 4, policy.lossy=0 policy.server=0 policy.standby=1 policy.resetcheck=0
2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept my proto 10, their proto 10
2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 authorizor_protocol 2 len 174
2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2: challenging authorizer
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).send_connect_message_reply
2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 58 remaining bytes 0
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_reply_write r=0
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=33
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).process
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read continue len=33
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_1 r=0
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).wait_connect_message_auth
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=174
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_auth r=0
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2
2018-11-29 19:04:11.883 7efc63325700 20 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept got peer connect_seq 1 global_seq 29
2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept of host_type 4, policy.lossy=0 policy.server=0 policy.standby=1 policy.resetcheck=0
2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept my proto 10, their proto 10
2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 authorizor_protocol 2 len 174
2018-11-29 19:04:11.883 7efc63325700 10 osd.7 87 ms_handle_authentication new session 0xcc27900 con 0xcc62480 entity osd.2 addr 172.21.15.122:6813/12771
2018-11-29 19:04:11.883 7efc63325700 10 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept setting up session_security.
2018-11-29 19:04:11.883 7efc63325700  0 -- 172.21.15.96:6810/15937 >> 172.21.15.122:6813/12771 conn(0xcc62480 legacy :6810 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept we reset (peer sent cseq 1), sending RESETSESSION

Actions #2

Updated by Sage Weil over 5 years ago

  • Status changed from 12 to 7
  • Backport set to mimic,luminous
Actions #3

Updated by Kefu Chai over 5 years ago

  • Status changed from 7 to Pending Backport
Actions #4

Updated by Nathan Cutler over 5 years ago

Actions #5

Updated by Nathan Cutler over 5 years ago

Actions #6

Updated by Sage Weil over 5 years ago

  • Related to Bug #21143: bad RESETSESSION between OSDs? added
Actions #7

Updated by Sage Weil over 5 years ago

  • Has duplicate Bug #36666: msg: rejoin message queued but not sent added
Actions #8

Updated by Greg Farnum about 5 years ago

  • Project changed from RADOS to Messengers
Actions #9

Updated by Greg Farnum about 5 years ago

  • Category set to AsyncMessenger
Actions #10

Updated by Nathan Cutler over 4 years ago

  • Pull request ID set to 25343
Actions #11

Updated by Greg Farnum over 4 years ago

  • Has duplicate Bug #42058: OSD reconnected across map epochs, inconsistent pg logs created added
Actions #12

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions #13

Updated by Nathan Cutler over 4 years ago

backports were not needed, because the bug was introduced during post-mimic refactoring of msgr V1 - see https://github.com/ceph/ceph/pull/30673#issuecomment-543659834

"The backport is not needed in luminous, nor in mimic. The problem was caused with the reduction of states of the V1 protocol during its refactoring."

Actions #14

Updated by Patrick Donnelly over 4 years ago

  • Has duplicate Bug #36507: client: connection failure during reconnect causes client to hang added
Actions

Also available in: Atom PDF