Project

General

Profile

Actions

Bug #13992

closed

LibRadosMiscConnectFailure.ConnectFailure (not so intermittent) failure in upgrade/hammer-x

Added by Sage Weil over 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados, upgrade/hammer-x, upgrade/infernalis
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2015-12-05T17:10:59.539 INFO:tasks.workunit.client.0.vpm021.stdout:[----------] 1 test from LibRadosMiscConnectFailure
2015-12-05T17:10:59.539 INFO:tasks.workunit.client.0.vpm021.stdout:[ RUN      ] LibRadosMiscConnectFailure.ConnectFailure
2015-12-05T17:10:59.547 INFO:tasks.workunit.client.0.vpm021.stdout:test/librados/misc.cc:56: Failure
2015-12-05T17:10:59.547 INFO:tasks.workunit.client.0.vpm021.stdout:Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2015-12-05T17:10:59.547 INFO:tasks.workunit.client.0.vpm021.stdout:[  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (11 ms)
2015-12-05T17:10:59.547 INFO:tasks.workunit.client.0.vpm021.stdout:[----------] 1 test from LibRadosMiscConnectFailure (11 ms total)

/a/sage-2015-12-05_14:56:25-upgrade:hammer-x-jewel---basic-vps/1167901
/a/sage-2015-12-05_14:55:59-upgrade:hammer-x-jewel---basic-vps/1167867

Related issues 2 (0 open2 closed)

Has duplicate Ceph - Bug #15178: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure (9 ms)DuplicateSamuel Just03/17/2016

Actions
Copied to Ceph - Backport #15320: hammer: LibRadosMiscConnectFailure.ConnectFailure (not so intermittent) failure in upgrade/hammer-x ResolvedNathan CutlerActions
Actions #1

Updated by Yuri Weinstein over 8 years ago

  • Release set to jewel
  • ceph-qa-suite upgrade/hammer-x added
Actions #2

Updated by Yuri Weinstein over 8 years ago

  • Release set to infernalis

Same in infernalis
Run: http://pulpito.ceph.com/teuthology-2016-01-12_17:10:05-upgrade:hammer-x-infernalis-distro-basic-vps/
Job: 26151
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-01-12_17:10:05-upgrade:hammer-x-infernalis-distro-basic-vps/26151/teuthology.log

2016-01-12T19:16:23.810 INFO:tasks.workunit.client.0.vpm159.stdout:test/librados/misc.cc:56: Failure
2016-01-12T19:16:23.810 INFO:tasks.workunit.client.0.vpm159.stdout:Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-01-12T19:16:23.811 INFO:tasks.workunit.client.0.vpm159.stdout:[  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (11 ms)
Actions #3

Updated by Yuri Weinstein over 8 years ago

  • Subject changed from LibRadosMiscConnectFailure.ConnectFailure intermittent failure in upgrade/hammer-x to LibRadosMiscConnectFailure.ConnectFailure (not so intermittent) failure in upgrade/hammer-x
Actions #7

Updated by Yuri Weinstein about 8 years ago

  • ceph-qa-suite rados added

Also in rados suite
Run: http://pulpito.ovh.sepia.ceph.com:8081/teuthology-2016-01-24_02:00:01-rados-infernalis-distro-basic-openstack/
Job: 8903
Logs: http://teuthology.ovh.sepia.ceph.com/teuthology/teuthology-2016-01-24_02:00:01-rados-infernalis-distro-basic-openstack/8903/teuthology.log

2016-01-24T04:02:12.448 INFO:tasks.workunit.client.0.target069076.stdout:test/librados/misc.cc:56: Failure
2016-01-24T04:02:12.448 INFO:tasks.workunit.client.0.target069076.stdout:Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-01-24T04:02:12.448 INFO:tasks.workunit.client.0.target069076.stdout:[  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (11 ms)
Actions #8

Updated by Yuri Weinstein about 8 years ago

Actions #11

Updated by Sage Weil about 8 years ago

/a/sage-2016-03-08_12:22:24-rados-wip-sage-testing---basic-smithi/47523

Actions #13

Updated by Yuri Weinstein about 8 years ago

  • ceph-qa-suite upgrade/infernalis added

Also in
http://pulpito.ceph.com/teuthology-2016-03-13_17:10:11-upgrade:infernalis-infernalis-distro-basic-vps/
Job: 57839
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-03-13_17:10:11-upgrade:infernalis-infernalis-distro-basic-vps/57839/teuthology.log

2016-03-13T18:14:49.236 INFO:tasks.workunit.client.0.vpm136.stdout:test/librados/misc.cc:56: Failure
2016-03-13T18:14:49.236 INFO:tasks.workunit.client.0.vpm136.stdout:Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-03-13T18:14:49.236 INFO:tasks.workunit.client.0.vpm136.stdout:[  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (12 ms)
Actions #14

Updated by Sage Weil about 8 years ago

  • Status changed from New to In Progress
  • Assignee set to Sage Weil
Actions #15

Updated by Sage Weil about 8 years ago

  • Status changed from In Progress to Fix Under Review
Actions #16

Updated by Samuel Just about 8 years ago

  • Related to Bug #15178: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure (9 ms) added
Actions #17

Updated by Samuel Just about 8 years ago

  • Related to deleted (Bug #15178: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure (9 ms))
Actions #18

Updated by Samuel Just about 8 years ago

  • Has duplicate Bug #15178: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure (9 ms) added
Actions #20

Updated by Sage Weil about 8 years ago

  • Status changed from Fix Under Review to Resolved

hopefully fixed. reopen if not!

Actions #21

Updated by Sage Weil about 8 years ago

  • Status changed from Resolved to 12

Nope, hit it again:

2016-03-25T08:01:31.235 INFO:tasks.workunit.client.0.smithi039.stdout:                 api_misc: [ RUN      ] LibRadosMiscConnectFailure.ConnectFailure
2016-03-25T08:01:31.235 INFO:tasks.workunit.client.0.smithi039.stdout:                 api_misc: 2016-03-25 15:01:23.093232 7f63aa1608c0 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-03-25T08:01:31.235 INFO:tasks.workunit.client.0.smithi039.stdout:                 api_misc: test/librados/misc.cc:68: Failure
2016-03-25T08:01:31.235 INFO:tasks.workunit.client.0.smithi039.stdout:                 api_misc: Expected: (0) != (rados_connect(cluster)), actual: 0 vs 0
2016-03-25T08:01:31.236 INFO:tasks.workunit.client.0.smithi039.stdout:                 api_misc: [  FAILED  ] LibRadosMiscConnectFailure.ConnectFailure (18 ms)

on /a/sage-2016-03-25_06:54:48-rados-wip-sage-testing2-distro-basic-smithi/85989 ... which included the attempted fix.
Actions #22

Updated by Sage Weil about 8 years ago

/a/sage-2016-03-25_06:54:48-rados-wip-sage-testing2-distro-basic-smithi/86079

Actions #23

Updated by Sage Weil about 8 years ago

hitting this a lot. added 1cbe2bd9d417656a9a6e1ddf0438abe2a98f8116 to get monc debug logs.

Actions #24

Updated by Sage Weil about 8 years ago

  • Status changed from 12 to Fix Under Review

https://github.com/ceph/ceph/pull/8335

finally foudn it. easily reproduced by adding a sleep(1) in teh interval where we drop the lock.

Actions #25

Updated by Sage Weil about 8 years ago

  • Priority changed from Urgent to Immediate
Actions #26

Updated by Sage Weil about 8 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to hammer
Actions #27

Updated by Loïc Dachary about 8 years ago

  • Copied to Backport #15320: hammer: LibRadosMiscConnectFailure.ConnectFailure (not so intermittent) failure in upgrade/hammer-x added
Actions #28

Updated by Sage Weil about 8 years ago

  • Priority changed from Immediate to Urgent
Actions #29

Updated by Nathan Cutler over 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF