Project

General

Profile

Bug #38493

msg/async: connection race + winner fault can leave connection stuck at replacing forever

Added by xie xingguo 4 months ago. Updated 3 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
AsyncMessenger
Target version:
Start date:
02/27/2019
Due date:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
luminous,mimic,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

2019-02-02 09:31:03.402291 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0)._process_connection accept got peer connect_seq 55 global_seq 29026144

2019-02-02 09:31:03.402085 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).handle_connect_msg accept my proto 13, their proto 13
2019-02-02 09:31:03.402090 7f5f4935e700 10 mon.host-192-168-7-118@0(electing) e1 ms_verify_authorizer 100.100.7.122:6789/0 mon protocol 2
2019-02-02 09:31:03.402124 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).handle_connect_msg accept setting up session_security.
2019-02-02 09:31:03.402131 7f5f4935e700 1 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).handle_connect_msg existing racing replace happened while replacing. existing_state=STATE_ACCEPTING_WAIT_CONNECT_MSG
2019-02-02 09:31:03.402144 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0)._try_send sent bytes 62 remaining bytes 0
2019-02-02 09:31:03.402154 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0)
.process prev state is STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH
2019-02-02 09:31:03.402278 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0)
.process prev state is STATE_ACCEPTING_WAIT_CONNECT_MSG
2019-02-02 09:31:03.402286 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).process prev state is STATE_ACCEPTING_WAIT_CONNECT_MSG
2019-02-02 09:31:03.402291 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0)._process_connection accept got peer connect_seq 55 global_seq 29026144
2019-02-02 09:31:03.402298 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0)._process_connection accept of host_type 1, policy.lossy=0 policy.server=0 policy.standby=1 policy.resetcheck=1
2019-02-02 09:31:03.402303 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).handle_connect_msg accept my proto 13, their proto 13
2019-02-02 09:31:03.402308 7f5f4935e700 10 mon.host-192-168-7-118@0(electing) e1 ms_verify_authorizer 100.100.7.122:6789/0 mon protocol 2
2019-02-02 09:31:03.402342 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).handle_connect_msg accept setting up session_security.
2019-02-02 09:31:03.402349 7f5f4935e700 1 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).handle_connect_msg existing racing replace happened while replacing. existing_state=STATE_ACCEPTING_WAIT_CONNECT_MSG
2019-02-02 09:31:03.402363 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0)._try_send sent bytes 62 remaining bytes 0
2019-02-02 09:31:03.402378 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0)
.process prev state is STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH
2019-02-02 09:31:03.402503 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0)
.process prev state is STATE_ACCEPTING_WAIT_CONNECT_MSG
2019-02-02 09:31:03.402515 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0).process prev state is STATE_ACCEPTING_WAIT_CONNECT_MSG
2019-02-02 09:31:03.402522 7f5f4935e700 20 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0)._process_connection accept got peer connect_seq 55 global_seq 29026145
2019-02-02 09:31:03.402527 7f5f4935e700 10 -- 100.100.7.118:6789/0 >> 100.100.7.122:6789/0 conn(0x55ec02821000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
l=0)._process_connection accept of host_type 1, policy.lossy=0 policy.server=0 policy.standby=1 policy.resetcheck=1


Related issues

Copied to Messengers - Backport #39241: nautilus: msg/async: connection race + winner fault can leave connection stuck at replacing forever Resolved
Copied to Messengers - Backport #39242: mimic: msg/async: connection race + winner fault can leave connection stuck at replacing forever Need More Info
Copied to Messengers - Backport #39243: luminous: msg/async: connection race + winner fault can leave connection stuck at replacing forever In Progress

History

#1 Updated by Greg Farnum 4 months ago

Hmm, I thought Sage just fixed this bug, what's the exact sha1?

#2 Updated by xie xingguo 4 months ago

Greg Farnum wrote:

Hmm, I thought Sage just fixed this bug, what's the exact sha1?

You mean http://tracker.ceph.com/issues/37779, right? Sadly it is not the same issue :-(

#3 Updated by Greg Farnum 3 months ago

  • Project changed from Ceph to Messengers
  • Category deleted (msgr)

#4 Updated by Greg Farnum 3 months ago

  • Category set to AsyncMessenger

#5 Updated by xie xingguo 3 months ago

  • Status changed from New to Pending Backport
  • Backport set to luminous,mimic,nautilus

#6 Updated by Nathan Cutler 2 months ago

  • Copied to Backport #39241: nautilus: msg/async: connection race + winner fault can leave connection stuck at replacing forever added

#7 Updated by Nathan Cutler 2 months ago

  • Copied to Backport #39242: mimic: msg/async: connection race + winner fault can leave connection stuck at replacing forever added

#8 Updated by Nathan Cutler 2 months ago

  • Copied to Backport #39243: luminous: msg/async: connection race + winner fault can leave connection stuck at replacing forever added

Also available in: Atom PDF