Project

General

Profile

Actions

Bug #39039

closed

mon connection reset, command not resent

Added by Sage Weil about 5 years ago. Updated over 2 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-03-28 19:35:52.315 7f45995fe700  1 -- [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] <== client.? 172.21.15.31:0/386512091 6 ==== mon_command({"prefix": "osd last-stat-seq", "id": 0} v 0) v1 ==== 82+0+0 (secure 0 0 0) 0x55da2b811400 con 0x55da2b56e400
...
2019-03-28 19:35:52.317 7f4596df9700 20 -- [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] >> 172.21.15.31:0/386512091 conn(0x55da2b56e400 msgr2=0x55da2b844c00 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read continue len=32
2019-03-28 19:35:52.317 7f4596df9700  1 -- [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] >> 172.21.15.31:0/386512091 conn(0x55da2b56e400 msgr2=0x55da2b844c00 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 40
2019-03-28 19:35:52.317 7f4596df9700  1 -- [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] >> 172.21.15.31:0/386512091 conn(0x55da2b56e400 msgr2=0x55da2b844c00 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2019-03-28 19:35:52.317 7f4596df9700 20 --2- [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] >> 172.21.15.31:0/386512091 conn(0x55da2b56e400 0x55da2b844c00 secure :-1 s=READY pgs=2 cs=0 l=1 rx=0x55da2b374990 tx=0x55da2b570d00).handle_read_frame_preamble_main r=-1
2019-03-28 19:35:52.317 7f4596df9700  1 --2- [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] >> 172.21.15.31:0/386512091 conn(0x55da2b56e400 0x55da2b844c00 secure :-1 s=READY pgs=2 cs=0 l=1 rx=0x55da2b374990 tx=0x55da2b570d00).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
2019-03-28 19:35:52.317 7f4596df9700 10 --2- [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] >> 172.21.15.31:0/386512091 conn(0x55da2b56e400 0x55da2b844c00 secure :-1 s=READY pgs=2 cs=0 l=1 rx=0x55da2b374990 tx=0x55da2b570d00)._fault
2019-03-28 19:35:52.317 7f4596df9700  2 --2- [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] >> 172.21.15.31:0/386512091 conn(0x55da2b56e400 0x55da2b844c00 secure :-1 s=READY pgs=2 cs=0 l=1 rx=0x55da2b374990 tx=0x55da2b570d00)._fault on lossy channel, failing
2019-03-28 19:35:52.317 7f4596df9700  1 --2- [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] >> 172.21.15.31:0/386512091 conn(0x55da2b56e400 0x55da2b844c00 secure :-1 s=READY pgs=2 cs=0 l=1 rx=0x55da2b374990 tx=0x55da2b570d00).stop
2019-03-28 19:35:52.317 7f4596df9700 10 --2- [v2:172.21.15.31:3300/0,v1:172.21.15.31:6789/0] >> 172.21.15.31:0/386512091 conn(0x55da2b56e400 0x55da2b844c00 unknown :-1 s=READY pgs=2 cs=0 l=1 rx=0 tx=0).discard_out_queue started

but not reconnect. the command times out:
2019-03-28T19:35:52.319 INFO:teuthology.orchestra.run.smithi031:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd last-stat-seq osd.1
2019-03-28T19:37:52.369 DEBUG:teuthology.orchestra.run:got remote process result: 124
2019-03-28T19:37:52.369 ERROR:teuthology.run_tasks:Saw exception from tasks.

/a/sage-2019-03-28_19:01:44-rados-wip-sage-testing-2019-03-28-1123-distro-basic-smithi/3780941

Files

ceph-user.pre-install (150 Bytes) ceph-user.pre-install Tony Davies, 04/20/2020 02:24 AM
APKBUILD (13.5 KB) APKBUILD Tony Davies, 04/20/2020 02:24 AM
git_search_fix.patch (1.79 KB) git_search_fix.patch Tony Davies, 04/20/2020 02:24 AM
musl-fixes.patch (469 Bytes) musl-fixes.patch Tony Davies, 04/20/2020 02:24 AM
nodeenv-armv8l.patch (1.5 KB) nodeenv-armv8l.patch Tony Davies, 04/20/2020 02:24 AM
PurgeQueue.cc.patch (1.34 KB) PurgeQueue.cc.patch Tony Davies, 04/20/2020 02:24 AM
mon.a.tar.gz (194 KB) mon.a.tar.gz Sunny Kumar, 06/25/2020 03:42 PM

Related issues 5 (1 open4 closed)

Related to RADOS - Bug #40521: cli timeout (e.g., ceph pg dump)Can't reproduce

Actions
Related to Ceph - Bug #44197: read_until returns Operation not permitted in a mixed arch client MON sessionResolvedBrad Hubbard

Actions
Related to RADOS - Bug #45647: "ceph --cluster ceph --log-early osd last-stat-seq osd.0" times out due to msgr-failures/many.yamlNew

Actions
Related to RADOS - Bug #49259: test_rados_api tests timeout with cephadm (plus extremely large OSD logs)ResolvedBrad Hubbard

Actions
Related to CephFS - Bug #53436: mds, mon: mds beacon messages get dropped? (mds never reaches up:active state)DuplicateXiubo Li

Actions
Actions

Also available in: Atom PDF