Actions
Bug #23957
openmsg/async: read connect reply failed, but not retry
Status:
New
Priority:
High
Assignee:
-
Category:
AsyncMessenger
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
on sending side,
2018-05-01 02:00:44.701 7f65dcb55700 10 osd.5 20 send_incremental_map 19 -> 20 to 0x55c34a5f2760 172.21.15.50:6806/12882 2018-05-01 02:00:44.701 7f65dcb55700 1 -- 172.21.15.166:6805/12966 --> 172.21.15.50:6806/12882 -- osd_map(20..20 src has 1..20) v4 -- ?+0 0x55c34a8fa840 con 0x55c34a5f2760 2018-05-01 02:00:44.701 7f65dcb55700 1 -- 172.21.15.166:6805/12966 --> 172.21.15.50:6806/12882 -- pg_query(2.2 epoch 20) v4 -- ?+0 0x55c34a6a9840 con 0x55c34a5f2760
on accepting side,
2018-05-01 02:00:03.950 7f7f113bb700 10 osd.1 15 new session 0x555ea53d4780 con=0x555ea53cb800 addr=172.21.15.166:6805/12966 2018-05-01 02:00:03.950 7f7f113bb700 10 osd.1 15 session 0x555ea53d4780 osd.5 has caps osdcap[grant(*)] 'allow *' 2018-05-01 02:00:03.950 7f7f113bb700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.166:6805/12966 conn(0x555ea53cb800 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING 2018-05-01 02:00:03.950 7f7f113bb700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.50:6814/13049 conn(0x555ea53c9c00 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0).read_bulk peer close file descriptor 69 2018-05-01 02:00:03.950 7f7f113bb700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.50:6814/13049 conn(0x555ea53c9c00 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0).read_until read failed 2018-05-01 02:00:03.950 7f7f113bb700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.50:6814/13049 conn(0x555ea53c9c00 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0)._process_connection read connect msg failed ... 2018-05-01 02:00:03.950 7f7f11bbc700 10 osd.1 15 new session 0x555ea2a6bc00 con=0x555ea53c9500 addr=172.21.15.166:6801/12910 2018-05-01 02:00:03.950 7f7f11bbc700 10 osd.1 15 session 0x555ea2a6bc00 osd.4 has caps osdcap[grant(*)] 'allow *' 2018-05-01 02:00:03.950 7f7f11bbc700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.166:6801/12910 conn(0x555ea53c9500 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING_WAIT_CONNECT_REPLY 2018-05-01 02:00:03.950 7f7f11bbc700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.166:6801/12910 conn(0x555ea53c9500 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0).read_bulk peer close file descriptor 68 2018-05-01 02:00:03.950 7f7f11bbc700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.166:6801/12910 conn(0x555ea53c9500 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0).read_until read failed 2018-05-01 02:00:03.950 7f7f11bbc700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.166:6801/12910 conn(0x555ea53c9500 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0)._process_connection read connect msg failed 2018-05-01 02:00:03.950 7f7f11bbc700 10 osd.1 15 new session 0x555ea53d4c80 con=0x555ea53c8e00 addr=172.21.15.50:6810/12951 2018-05-01 02:00:03.950 7f7f11bbc700 10 osd.1 15 session 0x555ea53d4c80 osd.2 has caps osdcap[grant(*)] 'allow *' 2018-05-01 02:00:03.950 7f7f11bbc700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.50:6810/12951 conn(0x555ea53c8e00 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 0 vs existing csq=0 existing_state=STATE_CONNECTING_WAIT_CONNECT_REPLY 2018-05-01 02:00:03.950 7f7f11bbc700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.50:6810/12951 conn(0x555ea53c8e00 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0).read_bulk peer close file descriptor 66 2018-05-01 02:00:03.950 7f7f11bbc700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.50:6810/12951 conn(0x555ea53c8e00 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0).read_until read failed 2018-05-01 02:00:03.950 7f7f11bbc700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.50:6810/12951 conn(0x555ea53c8e00 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG pgs=0 cs=0 l=0)._process_connection read connect msg failed 2018-05-01 02:00:03.950 7f7f11bbc700 10 osd.1 15 OSD::ms_get_authorizer type=osd 2018-05-01 02:00:03.950 7f7f11bbc700 10 osd.1 15 OSD::ms_get_authorizer type=osd 2018-05-01 02:00:03.950 7f7f11bbc700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.166:6805/12966 conn(0x555ea5420000 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY pgs=0 cs=0 l=0).read_bulk peer close file descriptor 66 2018-05-01 02:00:03.950 7f7f11bbc700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.166:6805/12966 conn(0x555ea5420000 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY pgs=0 cs=0 l=0).read_until read failed 2018-05-01 02:00:03.950 7f7f11bbc700 1 -- 172.21.15.50:6806/12882 >> 172.21.15.166:6805/12966 conn(0x555ea5420000 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY pgs=0 cs=0 l=0)._process_connection read connect reply failed
/a/dzafman-2018-04-30_18:45:29-rados:thrash-wip-zafman-testing-distro-basic-smithi/2458469
rados:thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml 2-recovery-overrides/default.yaml backoff/normal.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-balancer/upmap.yaml msgr-failures/fastclose.yaml msgr/random.yaml objectstore/bluestore.yaml rados.yaml rocksdb.yaml thrashers/none.yaml thrashosds-health.yaml workloads/radosbench.yaml}
radosbench times out with pgid 2.2 in state "creating+peering"
Actions