Project

General

Profile

Bug #13992

LibRadosMiscConnectFailure.ConnectFailure (not so intermittent) failure in upgrade/hammer-x

Added by Sage Weil over 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
12/06/2015
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados, upgrade/hammer-x, upgrade/infernalis
Pull request ID:

Description

2015-12-05T17:10:59.539 INFO:tasks.workunit.client.0.vpm021.stdout:[----------] 1 test from LibRadosMiscConnectFailure
2015-12-05T17:10:59.539 INFO:tasks.workunit.client.0.vpm021.stdout:[ RUN      ] LibRadosMiscConnectFailure.ConnectFailure
2015-12-05T17:10:59.547 INFO:tasks.workunit.client.0.vpm021.stdout:test/librados/misc.cc:56: Failure
2015-12-05T17:10:59.547 INFO:tasks.workunit.client.0.vpm021.stdout:Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2015-12-05T17:10:59.547 INFO:tasks.workunit.client.0.vpm021.stdout:[  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (11 ms)
2015-12-05T17:10:59.547 INFO:tasks.workunit.client.0.vpm021.stdout:[----------] 1 test from LibRadosMiscConnectFailure (11 ms total)

/a/sage-2015-12-05_14:56:25-upgrade:hammer-x-jewel---basic-vps/1167901
/a/sage-2015-12-05_14:55:59-upgrade:hammer-x-jewel---basic-vps/1167867

Related issues

Duplicated by Ceph - Bug #15178: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure (9 ms) Duplicate 03/17/2016
Copied to Ceph - Backport #15320: hammer: LibRadosMiscConnectFailure.ConnectFailure (not so intermittent) failure in upgrade/hammer-x Resolved

Associated revisions

Revision 959ae39e (diff)
Added by Sage Weil about 3 years ago

ceph_test_rados_misc: shorten mount timeout

This might fix #13992.

Signed-off-by: Sage Weil <>

History

#1 Updated by Yuri Weinstein over 3 years ago

  • Release set to jewel
  • ceph-qa-suite upgrade/hammer-x added

#2 Updated by Yuri Weinstein about 3 years ago

  • Release set to infernalis

Same in infernalis
Run: http://pulpito.ceph.com/teuthology-2016-01-12_17:10:05-upgrade:hammer-x-infernalis-distro-basic-vps/
Job: 26151
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-01-12_17:10:05-upgrade:hammer-x-infernalis-distro-basic-vps/26151/teuthology.log

2016-01-12T19:16:23.810 INFO:tasks.workunit.client.0.vpm159.stdout:test/librados/misc.cc:56: Failure
2016-01-12T19:16:23.810 INFO:tasks.workunit.client.0.vpm159.stdout:Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-01-12T19:16:23.811 INFO:tasks.workunit.client.0.vpm159.stdout:[  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (11 ms)

#3 Updated by Yuri Weinstein about 3 years ago

  • Subject changed from LibRadosMiscConnectFailure.ConnectFailure intermittent failure in upgrade/hammer-x to LibRadosMiscConnectFailure.ConnectFailure (not so intermittent) failure in upgrade/hammer-x

#7 Updated by Yuri Weinstein about 3 years ago

  • ceph-qa-suite rados added

Also in rados suite
Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-01-24_02:00:01-rados-infernalis-distro-basic-openstack/
Job: 8903
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-01-24_02:00:01-rados-infernalis-distro-basic-openstack/8903/teuthology.log

2016-01-24T04:02:12.448 INFO:tasks.workunit.client.0.target069076.stdout:test/librados/misc.cc:56: Failure
2016-01-24T04:02:12.448 INFO:tasks.workunit.client.0.target069076.stdout:Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-01-24T04:02:12.448 INFO:tasks.workunit.client.0.target069076.stdout:[  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (11 ms)

#8 Updated by Yuri Weinstein about 3 years ago

#11 Updated by Sage Weil about 3 years ago

/a/sage-2016-03-08_12:22:24-rados-wip-sage-testing---basic-smithi/47523

#13 Updated by Yuri Weinstein about 3 years ago

  • ceph-qa-suite upgrade/infernalis added

Also in
http://pulpito.ceph.com/teuthology-2016-03-13_17:10:11-upgrade:infernalis-infernalis-distro-basic-vps/
Job: 57839
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-03-13_17:10:11-upgrade:infernalis-infernalis-distro-basic-vps/57839/teuthology.log

2016-03-13T18:14:49.236 INFO:tasks.workunit.client.0.vpm136.stdout:test/librados/misc.cc:56: Failure
2016-03-13T18:14:49.236 INFO:tasks.workunit.client.0.vpm136.stdout:Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-03-13T18:14:49.236 INFO:tasks.workunit.client.0.vpm136.stdout:[  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (12 ms)

#14 Updated by Sage Weil about 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Sage Weil

#15 Updated by Sage Weil about 3 years ago

  • Status changed from In Progress to Need Review

#16 Updated by Samuel Just about 3 years ago

  • Related to Bug #15178: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure (9 ms) added

#17 Updated by Samuel Just about 3 years ago

  • Related to deleted (Bug #15178: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure (9 ms))

#18 Updated by Samuel Just about 3 years ago

  • Duplicated by Bug #15178: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure (9 ms) added

#20 Updated by Sage Weil about 3 years ago

  • Status changed from Need Review to Resolved

hopefully fixed. reopen if not!

#21 Updated by Sage Weil almost 3 years ago

  • Status changed from Resolved to Verified

Nope, hit it again:

2016-03-25T08:01:31.235 INFO:tasks.workunit.client.0.smithi039.stdout:                 api_misc: [ RUN      ] LibRadosMiscConnectFailure.ConnectFailure
2016-03-25T08:01:31.235 INFO:tasks.workunit.client.0.smithi039.stdout:                 api_misc: 2016-03-25 15:01:23.093232 7f63aa1608c0 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-03-25T08:01:31.235 INFO:tasks.workunit.client.0.smithi039.stdout:                 api_misc: test/librados/misc.cc:68: Failure
2016-03-25T08:01:31.235 INFO:tasks.workunit.client.0.smithi039.stdout:                 api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-03-25T08:01:31.236 INFO:tasks.workunit.client.0.smithi039.stdout:                 api_misc: [  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (18 ms)

on /a/sage-2016-03-25_06:54:48-rados-wip-sage-testing2-distro-basic-smithi/85989 ... which included the attempted fix.

#22 Updated by Sage Weil almost 3 years ago

/a/sage-2016-03-25_06:54:48-rados-wip-sage-testing2-distro-basic-smithi/86079

#23 Updated by Sage Weil almost 3 years ago

hitting this a lot. added 1cbe2bd9d417656a9a6e1ddf0438abe2a98f8116 to get monc debug logs.

#24 Updated by Sage Weil almost 3 years ago

  • Status changed from Verified to Need Review

https://github.com/ceph/ceph/pull/8335

finally foudn it. easily reproduced by adding a sleep(1) in teh interval where we drop the lock.

#25 Updated by Sage Weil almost 3 years ago

  • Priority changed from Urgent to Immediate

#26 Updated by Sage Weil almost 3 years ago

  • Status changed from Need Review to Pending Backport
  • Backport set to hammer

#27 Updated by Loic Dachary almost 3 years ago

  • Copied to Backport #15320: hammer: LibRadosMiscConnectFailure.ConnectFailure (not so intermittent) failure in upgrade/hammer-x added

#28 Updated by Sage Weil almost 3 years ago

  • Priority changed from Immediate to Urgent

#29 Updated by Nathan Cutler over 2 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF