Project

General

Profile

Bug #21124

msg/async: failure to connect

Added by Sage Weil over 6 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

on sending end,

2017-08-24 20:56:05.612542 7fd8fe51e700 10 osd.4 12 send_incremental_map 11 -> 12 to 0x7fd93efe3000 172.21.15.72:6802/1291
2017-08-24 20:56:05.612570 7fd8fe51e700  1 -- 172.21.15.3:6805/9568 --> 172.21.15.72:6802/1291 -- osd_map(12..12 src has 1..12) v3 -- ?+0 0x7fd93ea97900 con 0x7fd93efe3000
...
2017-08-24 20:56:10.264499 7fd918b66700  1 -- 172.21.15.3:6805/9568 --> 172.21.15.72:6802/1291 -- pg_query(1.2 epoch 13) v4 -- ?+0 0x7fd93f08e900 con 0x7fd93efe3000
...
2017-08-24 20:56:11.332220 7fd8ff52e700 10 osd.4 15 send_incremental_map 14 -> 15 to 0x7fd93efe3000 172.21.15.72:6802/1291
2017-08-24 20:56:11.332251 7fd8ff52e700  1 -- 172.21.15.3:6805/9568 --> 172.21.15.72:6802/1291 -- osd_map(15..15 src has 1..15) v3 -- ?+0 0x7fd93f934280 con 0x7fd93efe3000

on receiving end,
2017-08-24 20:56:05.616614 7f3d9e220700 10 osd.0 11  new session 0x7f3daebfc600 con=0x7f3daedb2000 addr=172.21.15.3:6805/9568
2017-08-24 20:56:05.616655 7f3d9e220700 10 osd.0 11  session 0x7f3daebfc600 osd.4 has caps osdcap[grant(*)] 'allow *'
2017-08-24 20:56:05.616670 7f3d9e220700  0 -- 172.21.15.72:6802/1291 >> 172.21.15.3:6805/9568 conn(0x7f3daedb2000 :6802 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY
2017-08-24 20:56:05.616676 7f3d9ea21700 10 osd.0 11  session 0x7f3daedad200 osd.3 has caps osdcap[grant(*)] 'allow *'
2017-08-24 20:56:05.616680 7f3d8da29700  2 osd.0 11 ms_handle_reset con 0x7f3daed9c000 session 0x7f3daedace00
2017-08-24 20:56:05.616710 7f3d9ea21700  0 -- 172.21.15.72:6802/1291 >> 172.21.15.3:6801/9443 conn(0x7f3daedb3800 :6802 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY
2017-08-24 20:56:05.616771 7f3d9da1f700 10 osd.0 11 OSD::ms_get_authorizer type=osd
2017-08-24 20:56:05.616785 7f3d9ea21700 10 osd.0 11 OSD::ms_get_authorizer type=osd
2017-08-24 20:56:05.616853 7f3d9da1f700 10 osd.0 11 OSD::ms_get_authorizer type=osd
2017-08-24 20:56:05.616847 7f3d9ea21700  0 -- 172.21.15.72:6802/1291 >> 172.21.15.3:6805/9568 conn(0x7f3daedbd800 :-1 s=STATE_CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0)._try_send injecting socket failure
2017-08-24 20:56:05.616906 7f3d9ea21700  1 -- 172.21.15.72:6802/1291 >> 172.21.15.3:6805/9568 conn(0x7f3daedbd800 :-1 s=STATE_CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0)._try_send send error: (32) Broken pipe
...

/a/sage-2017-08-24_17:38:40-rados-wip-sage-testing2-luminous-20170824a-distro-basic-smithi/1560508

logs in archive dir.

when i connected and marked an osd down (0 i think?) everything recovered and proceeded just fine.

History

#1 Updated by Haomai Wang over 6 years ago

  • Status changed from 12 to Resolved
  • Priority changed from Urgent to Normal

I guess this should be fixed.

#2 Updated by Greg Farnum about 5 years ago

  • Project changed from RADOS to Messengers

Also available in: Atom PDF