Project

General

Profile

Actions

Bug #15368

closed

"api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure"

Added by Yuri Weinstein about 8 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
kraken,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados, smoke
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.ceph.com/teuthology-2016-04-03_22:00:01-rados-jewel-distro-basic-smithi/
Job: 106350
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-04-03_22:00:01-rados-jewel-distro-basic-smithi/106350/teuthology.log

016-04-03T23:58:37.937 INFO:tasks.workunit.client.0.smithi030.stdout:                 api_misc: test/librados/misc.cc:71: Failure
2016-04-03T23:58:37.937 INFO:tasks.workunit.client.0.smithi030.stdout:                 api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-04-03T23:58:37.938 INFO:tasks.workunit.client.0.smithi030.stdout:                 api_misc: [  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (16 ms)
2016-04-03T23:58:37.938 INFO:tasks.workunit.client.0.smithi030.stdout:                 api_misc: [----------] 1 test from LibRadosMiscConnectFailure (16 ms total)

Related issues 4 (0 open4 closed)

Related to Ceph - Bug #15477: "failed (workunit test osdc/stress_objectcacher.sh)" in rados-hammer-distro-basic-vps/Can't reproduce04/12/2016

Actions
Related to Ceph - Feature #16091: Monclient: hunt for mons in parallelResolvedKefu Chai05/31/2016

Actions
Copied to Ceph - Backport #19561: kraken: "api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure"ResolvedNathan CutlerActions
Copied to Ceph - Backport #19562: jewel: "api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure"ResolvedNathan CutlerActions
Actions #1

Updated by Samuel Just about 8 years ago

370e4f773a5347a2d0c0493ccf3dc55909b75bce

Actions #2

Updated by Samuel Just about 8 years ago

  • Assignee set to Samuel Just

The bug seems to be that in MonClient::authenticate, it's entirely possible that the connection is established between the timeout and when the thread is rescheduled. Two options in that case:
1) disconnect and return the error
2) return success

2) seems cleaner, but we'll have to modify the test a bit to use ms_inject_delay* to ensure that the authentication happens more slowly than the timeout. Annoyingly, async messenger does not yet support that. I think I'll implement 2 and force that test to use simple messenger.

Actions #3

Updated by Samuel Just about 8 years ago

Hmm, that doesn't really help, we still need a way to cancel the in progress authentication...

Actions #4

Updated by Samuel Just about 8 years ago

Neither init() nor shutdown() seems to clear the authentication state.

Actions #6

Updated by David Zafman about 8 years ago

  • Related to Bug #15477: "failed (workunit test osdc/stress_objectcacher.sh)" in rados-hammer-distro-basic-vps/ added
Actions #7

Updated by Samuel Just almost 8 years ago

  • Related to Feature #16091: Monclient: hunt for mons in parallel added
Actions #8

Updated by Yuri Weinstein almost 8 years ago

  • Release set to master

In http://qa-proxy.ceph.com/teuthology/yuriw-2016-06-25_17:00:35-rados-wip-yuri-testing2_2016_6_23-distro-basic-smithi/276984/teuthology.log

2016-06-26T10:26:55.932 INFO:tasks.workunit.client.0.smithi002.stdout:                 api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-06-26T10:26:55.932 INFO:tasks.workunit.client.0.smithi002.stdout:                 api_misc: [  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (14 ms)
2016-06-26T10:26:55.932 INFO:tasks.workunit.client.0.smithi002.stdout:                 api_misc: 2016-06-26 17:26:54.089527 7f3edeffd700  1 -- 172.21.15.2:0/1573833271 >> 172.21.15.2:6790/0 conn(0x7f3f02ac3a00 sd=17 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=22 cs=1 l=1). == rx == mon.2 seq 6 0x7f3ed8000b90 osd_map(15..15 src has 1..15) v3
2016-06-26T10:26:55.932 INFO:tasks.workunit.client.0.smithi002.stdout: api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-06-26T10:26:55.932 INFO:tasks.workunit.client.0.smithi002.stdout: api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure (14 ms)
2016-06-26T10:26:55.932 INFO:tasks.workunit.client.0.smithi002.stdout: api_misc: 2016-06-26 17:26:54.089527 7f3edeffd700 1 -- 172.21.15.2:0/1573833271 >> 172.21.15.2:6790/0 conn(0x7f3f02ac3a00 sd=17 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=22 cs=1 l=1). rx mon.2 seq 6 0x7f3ed8000b90 osd_map(15..15 src has 1..15) v3
Actions #9

Updated by Samuel Just over 7 years ago

  • Assignee deleted (Samuel Just)
Actions #10

Updated by Samuel Just over 7 years ago

  • Priority changed from Urgent to Normal
Actions #11

Updated by Yuri Weinstein over 7 years ago

http://qa-proxy.ceph.com/teuthology/teuthology-2016-09-01_22:00:02-rados-jewel-distro-basic-smithi/395640/teuthology.log

2016-09-01T22:33:33.448 INFO:tasks.workunit.client.0.smithi026.stdout:                 api_misc: test/librados/misc.cc:70: Failure
2016-09-01T22:33:33.448 INFO:tasks.workunit.client.0.smithi026.stdout:                 api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-09-01T22:33:33.448 INFO:tasks.workunit.client.0.smithi026.stdout:                 api_misc: [  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (28 ms)
Actions #12

Updated by Yuri Weinstein over 7 years ago

on master branch.
Run: http://pulpito.ceph.com/teuthology-2016-10-30_04:20:07-upgrade:jewel-x-master-distro-basic-vps/
Job: 502767
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-10-30_04:20:07-upgrade:jewel-x-master-distro-basic-vps/502767/teuthology.log

2016-10-30T05:26:17.244 INFO:tasks.workunit.client.0.vpm073.stdout:                 api_misc: 2016-10-30 05:26:17.085673 7fac68ba12c0 10 monclient: _send_mon_message to mon.c at 172.21.2.73:6790/0
2016-10-30T05:26:17.244 INFO:tasks.workunit.client.0.vpm073.stdout:                 api_misc: 2016-10-30 05:26:17.085677 7fac68ba12c0  1 -- 172.21.2.73:0/2458744259 --> 172.21.2.73:6790/0 -- mon_subscribe({mgrmap=0+}) v2 -- 0x7fac72034c00 con 0
2016-10-30T05:26:17.245 INFO:tasks.workunit.client.0.vpm073.stdout:                 api_misc: /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.2-791-g5354e7c/src/test/librados/misc.cc:70: Failure
2016-10-30T05:26:17.245 INFO:tasks.workunit.client.0.vpm073.stdout:                 api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-10-30T05:26:17.245 INFO:tasks.workunit.client.0.vpm073.stdout:                 api_misc: [  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (236 ms)
Actions #13

Updated by Sage Weil over 7 years ago

  • Subject changed from "api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure" in rados-jewel-distro-basic-smithi to "api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure"
  • Priority changed from Normal to Urgent

/a/sage-2016-11-29_20:05:25-rados:thrash-master---basic-smithi/586352

Actions #15

Updated by Nathan Cutler over 7 years ago

Sage, are you saying that http://tracker.ceph.com/issues/16091 should be backported to jewel? (The description of this bug indicates that the failure also happens in jewel rados suites)

Actions #16

Updated by Yuri Weinstein over 7 years ago

on master in smoke suite:

http://qa-proxy.ceph.com/teuthology/teuthology-2016-12-13_05:00:02-smoke-master-testing-basic-vps/628883/teuthology.log

2016-12-13T06:21:28.184 INFO:tasks.workunit.client.0.vpm139.stdout:                 api_misc: 2016-12-13 06:21:23.174238 7f169effd700  1 -- 172.21.2.139:0/3078865336 <== mon.1 172.21.2.171:6789/0 4 ==== osd_map(38..38 src has 1..38) v3 ==== 2532+0+0 (2506938689 0 0) 0x7f168c0027b0 con 0x55d0509451f0
2016-12-13T06:21:28.185 INFO:tasks.workunit.client.0.vpm139.stdout:                 api_misc: /build/ceph-11.0.2-2422-ga3bf341/src/test/librados/misc.cc:70: Failure
2016-12-13T06:21:28.185 INFO:tasks.workunit.client.0.vpm139.stdout:                 api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-12-13T06:21:28.185 INFO:tasks.workunit.client.0.vpm139.stdout:                 api_misc: [  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure 
Actions #17

Updated by Yuri Weinstein over 7 years ago

  • ceph-qa-suite smoke added
Actions #18

Updated by Sage Weil about 7 years ago

  • Status changed from New to Fix Under Review
Actions #19

Updated by Sage Weil about 7 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to kraken,jewel
Actions #20

Updated by Nathan Cutler about 7 years ago

  • Copied to Backport #19561: kraken: "api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure" added
Actions #21

Updated by Nathan Cutler about 7 years ago

  • Copied to Backport #19562: jewel: "api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure" added
Actions #22

Updated by Nathan Cutler almost 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF