Project

General

Profile

Actions

Bug #38493

closed

msg/async: connection race + winner fault can leave connection stuck at replacing forever

Added by xie xingguo about 5 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
AsyncMessenger
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
luminous,mimic,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-02-02 09:31:03.402291 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0)._process_connection accept got peer connect_seq 55 global_seq 29026144

2019-02-02 09:31:03.402085 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).handle_connect_msg accept my proto 13, their proto 13
2019-02-02 09:31:03.402090 7f5f4935e700 10 mon.host-192-168-7-118@0(electing) e1 ms_verify_authorizer 100.100.7.122:6789/0 mon protocol 2
2019-02-02 09:31:03.402124 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).handle_connect_msg accept setting up session_security.
2019-02-02 09:31:03.402131 7f5f4935e700 1 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).handle_connect_msg existing racing replace happened while replacing. existing_state=STATE_ACCEPTING_WAIT_CONNECT_MSG
2019-02-02 09:31:03.402144 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0)._try_send sent bytes 62 remaining bytes 0
2019-02-02 09:31:03.402154 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0)
.process prev state is STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH
2019-02-02 09:31:03.402278 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0)
.process prev state is STATE_ACCEPTING_WAIT_CONNECT_MSG
2019-02-02 09:31:03.402286 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).process prev state is STATE_ACCEPTING_WAIT_CONNECT_MSG
2019-02-02 09:31:03.402291 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0)._process_connection accept got peer connect_seq 55 global_seq 29026144
2019-02-02 09:31:03.402298 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0)._process_connection accept of host_type 1, policy.lossy=0 policy.server=0 policy.standby=1 policy.resetcheck=1
2019-02-02 09:31:03.402303 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).handle_connect_msg accept my proto 13, their proto 13
2019-02-02 09:31:03.402308 7f5f4935e700 10 mon.host-192-168-7-118@0(electing) e1 ms_verify_authorizer 100.100.7.122:6789/0 mon protocol 2
2019-02-02 09:31:03.402342 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).handle_connect_msg accept setting up session_security.
2019-02-02 09:31:03.402349 7f5f4935e700 1 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).handle_connect_msg existing racing replace happened while replacing. existing_state=STATE_ACCEPTING_WAIT_CONNECT_MSG
2019-02-02 09:31:03.402363 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0)._try_send sent bytes 62 remaining bytes 0
2019-02-02 09:31:03.402378 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0)
.process prev state is STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH
2019-02-02 09:31:03.402503 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0)
.process prev state is STATE_ACCEPTING_WAIT_CONNECT_MSG
2019-02-02 09:31:03.402515 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).process prev state is STATE_ACCEPTING_WAIT_CONNECT_MSG
2019-02-02 09:31:03.402522 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0)._process_connection accept got peer connect_seq 55 global_seq 29026145
2019-02-02 09:31:03.402527 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0)._process_connection accept of host_type 1, policy.lossy=0 policy.server=0 policy.standby=1 policy.resetcheck=1


Related issues 3 (0 open3 closed)

Copied to Messengers - Backport #39241: nautilus: msg/async: connection race + winner fault can leave connection stuck at replacing foreverResolvedPrashant DActions
Copied to Messengers - Backport #39242: mimic: msg/async: connection race + winner fault can leave connection stuck at replacing foreverRejectedxie xingguoActions
Copied to Messengers - Backport #39243: luminous: msg/async: connection race + winner fault can leave connection stuck at replacing foreverResolvedxie xingguoActions
Actions

Also available in: Atom PDF