Project

General

Profile

Bug #15368

"api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure"

Added by Yuri Weinstein about 1 year ago. Updated about 2 months ago.

Status:
Pending Backport
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
Start date:
04/04/2016
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
kraken,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados, smoke
Release:
jewel, master
Needs Doc:
No

Description

Run: http://pulpito.ceph.com/teuthology-2016-04-03_22:00:01-rados-jewel-distro-basic-smithi/
Job: 106350
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-04-03_22:00:01-rados-jewel-distro-basic-smithi/106350/teuthology.log

016-04-03T23:58:37.937 INFO:tasks.workunit.client.0.smithi030.stdout:                 api_misc: test/librados/misc.cc:71: Failure
2016-04-03T23:58:37.937 INFO:tasks.workunit.client.0.smithi030.stdout:                 api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-04-03T23:58:37.938 INFO:tasks.workunit.client.0.smithi030.stdout:                 api_misc: [  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (16 ms)
2016-04-03T23:58:37.938 INFO:tasks.workunit.client.0.smithi030.stdout:                 api_misc: [----------] 1 test from LibRadosMiscConnectFailure (16 ms total)

Related issues

Related to Bug #15477: "failed (workunit test osdc/stress_objectcacher.sh)" in rados-hammer-distro-basic-vps/ Can't reproduce 04/12/2016
Related to Feature #16091: Monclient: hunt for mons in parallel Resolved 05/31/2016
Copied to Backport #19561: kraken: "api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure" In Progress
Copied to Backport #19562: jewel: "api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure" Resolved

History

#1 Updated by Samuel Just about 1 year ago

370e4f773a5347a2d0c0493ccf3dc55909b75bce

#2 Updated by Samuel Just about 1 year ago

  • Assignee set to Samuel Just

The bug seems to be that in MonClient::authenticate, it's entirely possible that the connection is established between the timeout and when the thread is rescheduled. Two options in that case:
1) disconnect and return the error
2) return success

2) seems cleaner, but we'll have to modify the test a bit to use ms_inject_delay* to ensure that the authentication happens more slowly than the timeout. Annoyingly, async messenger does not yet support that. I think I'll implement 2 and force that test to use simple messenger.

#3 Updated by Samuel Just about 1 year ago

Hmm, that doesn't really help, we still need a way to cancel the in progress authentication...

#4 Updated by Samuel Just about 1 year ago

Neither init() nor shutdown() seems to clear the authentication state.

#6 Updated by David Zafman about 1 year ago

  • Related to Bug #15477: "failed (workunit test osdc/stress_objectcacher.sh)" in rados-hammer-distro-basic-vps/ added

#7 Updated by Samuel Just 12 months ago

  • Related to Feature #16091: Monclient: hunt for mons in parallel added

#8 Updated by Yuri Weinstein 11 months ago

  • Release master added

In http://qa-proxy.ceph.com/teuthology/yuriw-2016-06-25_17:00:35-rados-wip-yuri-testing2_2016_6_23-distro-basic-smithi/276984/teuthology.log

2016-06-26T10:26:55.932 INFO:tasks.workunit.client.0.smithi002.stdout:                 api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-06-26T10:26:55.932 INFO:tasks.workunit.client.0.smithi002.stdout:                 api_misc: [  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (14 ms)
2016-06-26T10:26:55.932 INFO:tasks.workunit.client.0.smithi002.stdout:                 api_misc: 2016-06-26 17:26:54.089527 7f3edeffd700  1 -- 172.21.15.2:0/1573833271 >> 172.21.15.2:6790/0 conn(0x7f3f02ac3a00 sd=17 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=22 cs=1 l=1). == rx == mon.2 seq 6 0x7f3ed8000b90 osd_map(15..15 src has 1..15) v3
2016-06-26T10:26:55.932 INFO:tasks.workunit.client.0.smithi002.stdout: api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-06-26T10:26:55.932 INFO:tasks.workunit.client.0.smithi002.stdout: api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure (14 ms)
2016-06-26T10:26:55.932 INFO:tasks.workunit.client.0.smithi002.stdout: api_misc: 2016-06-26 17:26:54.089527 7f3edeffd700 1 -- 172.21.15.2:0/1573833271 >> 172.21.15.2:6790/0 conn(0x7f3f02ac3a00 sd=17 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=22 cs=1 l=1). rx mon.2 seq 6 0x7f3ed8000b90 osd_map(15..15 src has 1..15) v3

#9 Updated by Samuel Just 10 months ago

  • Assignee deleted (Samuel Just)

#10 Updated by Samuel Just 9 months ago

  • Priority changed from Urgent to Normal

#11 Updated by Yuri Weinstein 9 months ago

http://qa-proxy.ceph.com/teuthology/teuthology-2016-09-01_22:00:02-rados-jewel-distro-basic-smithi/395640/teuthology.log

2016-09-01T22:33:33.448 INFO:tasks.workunit.client.0.smithi026.stdout:                 api_misc: test/librados/misc.cc:70: Failure
2016-09-01T22:33:33.448 INFO:tasks.workunit.client.0.smithi026.stdout:                 api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-09-01T22:33:33.448 INFO:tasks.workunit.client.0.smithi026.stdout:                 api_misc: [  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (28 ms)

#12 Updated by Yuri Weinstein 7 months ago

  • Needs Doc set to No

on master branch.
Run: http://pulpito.ceph.com/teuthology-2016-10-30_04:20:07-upgrade:jewel-x-master-distro-basic-vps/
Job: 502767
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-10-30_04:20:07-upgrade:jewel-x-master-distro-basic-vps/502767/teuthology.log

2016-10-30T05:26:17.244 INFO:tasks.workunit.client.0.vpm073.stdout:                 api_misc: 2016-10-30 05:26:17.085673 7fac68ba12c0 10 monclient: _send_mon_message to mon.c at 172.21.2.73:6790/0
2016-10-30T05:26:17.244 INFO:tasks.workunit.client.0.vpm073.stdout:                 api_misc: 2016-10-30 05:26:17.085677 7fac68ba12c0  1 -- 172.21.2.73:0/2458744259 --> 172.21.2.73:6790/0 -- mon_subscribe({mgrmap=0+}) v2 -- 0x7fac72034c00 con 0
2016-10-30T05:26:17.245 INFO:tasks.workunit.client.0.vpm073.stdout:                 api_misc: /srv/autobuild-ceph/gitbuilder.git/build/out~/ceph-11.0.2-791-g5354e7c/src/test/librados/misc.cc:70: Failure
2016-10-30T05:26:17.245 INFO:tasks.workunit.client.0.vpm073.stdout:                 api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-10-30T05:26:17.245 INFO:tasks.workunit.client.0.vpm073.stdout:                 api_misc: [  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (236 ms)

#13 Updated by Sage Weil 6 months ago

  • Subject changed from "api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure" in rados-jewel-distro-basic-smithi to "api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure"
  • Priority changed from Normal to Urgent

/a/sage-2016-11-29_20:05:25-rados:thrash-master---basic-smithi/586352

#15 Updated by Nathan Cutler 6 months ago

Sage, are you saying that http://tracker.ceph.com/issues/16091 should be backported to jewel? (The description of this bug indicates that the failure also happens in jewel rados suites)

#16 Updated by Yuri Weinstein 6 months ago

on master in smoke suite:

http://qa-proxy.ceph.com/teuthology/teuthology-2016-12-13_05:00:02-smoke-master-testing-basic-vps/628883/teuthology.log

2016-12-13T06:21:28.184 INFO:tasks.workunit.client.0.vpm139.stdout:                 api_misc: 2016-12-13 06:21:23.174238 7f169effd700  1 -- 172.21.2.139:0/3078865336 <== mon.1 172.21.2.171:6789/0 4 ==== osd_map(38..38 src has 1..38) v3 ==== 2532+0+0 (2506938689 0 0) 0x7f168c0027b0 con 0x55d0509451f0
2016-12-13T06:21:28.185 INFO:tasks.workunit.client.0.vpm139.stdout:                 api_misc: /build/ceph-11.0.2-2422-ga3bf341/src/test/librados/misc.cc:70: Failure
2016-12-13T06:21:28.185 INFO:tasks.workunit.client.0.vpm139.stdout:                 api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-12-13T06:21:28.185 INFO:tasks.workunit.client.0.vpm139.stdout:                 api_misc: [  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure 

#17 Updated by Yuri Weinstein 6 months ago

  • ceph-qa-suite smoke added

#18 Updated by Sage Weil about 2 months ago

  • Status changed from New to Need Review

#19 Updated by Sage Weil about 2 months ago

  • Status changed from Need Review to Pending Backport
  • Backport set to kraken,jewel

#20 Updated by Nathan Cutler about 2 months ago

  • Copied to Backport #19561: kraken: "api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure" added

#21 Updated by Nathan Cutler about 2 months ago

  • Copied to Backport #19562: jewel: "api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure" added

Also available in: Atom PDF