Project

General

Profile

Actions

Bug #20798

closed

LibRadosLockECPP.LockExclusiveDurPP gets EEXIST

Added by Sage Weil over 6 years ago. Updated over 4 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock: [ RUN      ] LibRadosLockECPP.LockExclusiveDurPP
2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock: /build/ceph-12.1.1-715-gc75dd25/src/test/librados/lock.cc:233: Failure
2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock:       Expected: 0
2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock: To be equal to: ioctx.lock_exclusive("foo", "TestLock", "Cookie", "", __null, 0)
2017-07-27T16:02:26.096 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock:       Which is: -17
2017-07-27T16:02:26.097 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock: [  FAILED  ] LibRadosLockECPP.LockExclusiveDurPP (1009 ms)

/a/sage-2017-07-27_13:38:00-rados-wip-sage-testing-distro-basic-smithi/1453171

note this is short_pg_log..

rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml backoff/peering.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-require-luminous/at-end.yaml msgr-failures/few.yaml msgr/random.yaml objectstore/bluestore.yaml rados.yaml rocksdb.yaml thrashers/default.yaml thrashosds-health.yaml workloads/rados_api_tests.yaml}

Actions #1

Updated by Sage Weil over 6 years ago

  • Priority changed from High to Urgent

/a/sage-2017-07-28_04:13:20-rados-wip-sage-testing-distro-basic-smithi/1455364

2017-07-28T05:32:11.900 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock: [ RUN      ] LibRadosLockEC.LockExclusiveDur
2017-07-28T05:32:11.900 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.1-795-g788745f/rpm/el7/BUILD/ceph-12.1.1-795-g788745f/src/test/
librados/lock.cc:224: Failure
2017-07-28T05:32:11.900 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock:       Expected: 0
2017-07-28T05:32:11.901 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock: To be equal to: rados_lock_exclusive(ioctx, "foo", "TestLock", "Cookie", "", __null, 0)
2017-07-28T05:32:11.901 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock:       Which is: -17
2017-07-28T05:32:11.901 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock: [  FAILED  ] LibRadosLockEC.LockExclusiveDur (976 ms)

Actions #2

Updated by Sage Weil over 6 years ago

2017-08-03T19:04:32.120 INFO:tasks.workunit.client.0.smithi097.stdout:[ RUN      ] ClsLock.TestLockDuration
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout:/build/ceph-12.1.2-97-g153d18c/src/test/cls_lock/test_cls_lock.cc:299: Failure
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout:      Expected: 0
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout:To be equal to: l.lock_exclusive(&ioctx, oid)
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout:      Which is: -17
2017-08-03T19:04:39.840 INFO:tasks.workunit.client.0.smithi097.stdout:[  FAILED  ] ClsLock.TestLockDuration (7720 ms)

/a/sage-2017-08-03_18:16:40-rados-wip-sage-testing2-20170803-distro-basic-smithi/1479540
Actions #3

Updated by Neha Ojha over 6 years ago

  • Status changed from 12 to In Progress
  • Assignee set to Neha Ojha
Actions #4

Updated by Neha Ojha over 6 years ago

  • Status changed from In Progress to Need More Info

Logs from the ClsLock unittest clearly show that there is a race in the test and it tries to take the lock again before the duration of the previous lock is over.

The first two set of logs from LibRadosLock also seem to indicate the same, but are hard to interpret since the lock names are same for all the tests. https://github.com/ceph/ceph/pull/16953 has been created to solve this problem.

Actions #5

Updated by Sage Weil over 6 years ago

  • Priority changed from Urgent to High
Actions #6

Updated by Sage Weil over 6 years ago

  • Status changed from Need More Info to Can't reproduce
Actions #7

Updated by Kefu Chai about 6 years ago

  • Status changed from Can't reproduce to 12
2018-03-01T04:29:22.584 INFO:tasks.workunit.client.0.smithi109.stdout:/build/ceph-13.0.1-2367-gee670b8/src/test/cls_lock/test_cls_lock.cc:299: Failure
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout:      Expected: 0
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout:To be equal to: l.lock_exclusive(&ioctx, oid)
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout:      Which is: -17
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout:[  FAILED  ] ClsLock.TestLockDuration (7033 ms)

/a/kchai-2018-03-01_03:47:54-rados-wip-kefu-testing-2018-03-01-1006-distro-basic-smithi/2237462
Actions #8

Updated by Kefu Chai almost 6 years ago

/kchai-2018-04-26_00:52:32-rados-wip-kefu-testing-2018-04-25-2253-distro-basic-smithi/2439501/

Actions #9

Updated by Sage Weil over 5 years ago

/a/sage-2018-07-27_22:50:28-rados-wip-sage-testing-2018-07-27-0744-distro-basic-smithi/2826326

Actions #10

Updated by Sage Weil over 5 years ago

/a/sage-2018-10-13_00:36:33-rados-wip-sage-testing-2018-10-12-1741-distro-basic-smithi/3133276

Actions #11

Updated by Sage Weil over 5 years ago

/a/sage-2018-10-29_01:11:58-rados-wip-sage-testing-2018-10-28-0943-distro-basic-smithi/3197984

Actions #12

Updated by Sage Weil over 5 years ago

/a/sage-2018-11-05_22:04:25-rados-wip-sage3-testing-2018-11-05-1406-distro-basic-smithi/3227352

Actions #13

Updated by Sage Weil over 5 years ago

/a/sage-2018-12-12_23:36:13-rados-wip-sage2-testing-2018-12-12-1435-distro-basic-smithi/3335654

Actions #14

Updated by Sage Weil over 5 years ago

I'm guessing this is teh same...

2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock: [ RUN      ] LibRadosLock.LockSharedDur
2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock: /build/ceph-14.0.1-2130-gaed40e3/src/test/librados/lock.cc:50: Failure
2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock:       Expected: expected
2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock:       Which is: 0
2019-01-01T05:42:26.630 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock: To be equal to: wait_until(1.0s, 0.1s, expected, lock_shared, nullptr)
2019-01-01T05:42:26.630 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock:       Which is: -17
2019-01-01T05:42:26.630 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock: [  FAILED  ] LibRadosLock.LockSharedDur (827 ms)

/a/sage-2019-01-01_04:27:00-rados-wip-sage-testing-2018-12-31-1546-distro-basic-smithi/3410786

Actions #15

Updated by Kefu Chai about 5 years ago

/a/kchai-2019-02-03_02:07:02-rados-wip-kefu2-testing-2019-02-03-0001-distro-basic-smithi/3543664/

Actions #16

Updated by Sage Weil about 5 years ago

/a/sage-2019-02-19_23:03:51-rados-wip-sage3-testing-2019-02-19-1008-distro-basic-smithi/3614261

Actions #17

Updated by Josh Durgin over 4 years ago

  • Status changed from 12 to Can't reproduce
Actions

Also available in: Atom PDF