Bug #20798
closed
LibRadosLockECPP.LockExclusiveDurPP gets EEXIST
Added by Sage Weil almost 7 years ago.
Updated over 4 years ago.
Description
2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout: api_lock: [ RUN ] LibRadosLockECPP.LockExclusiveDurPP
2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout: api_lock: /build/ceph-12.1.1-715-gc75dd25/src/test/librados/lock.cc:233: Failure
2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout: api_lock: Expected: 0
2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout: api_lock: To be equal to: ioctx.lock_exclusive("foo", "TestLock", "Cookie", "", __null, 0)
2017-07-27T16:02:26.096 INFO:tasks.workunit.client.0.smithi179.stdout: api_lock: Which is: -17
2017-07-27T16:02:26.097 INFO:tasks.workunit.client.0.smithi179.stdout: api_lock: [ FAILED ] LibRadosLockECPP.LockExclusiveDurPP (1009 ms)
/a/sage-2017-07-27_13:38:00-rados-wip-sage-testing-distro-basic-smithi/1453171
note this is short_pg_log..
rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml backoff/peering.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-require-luminous/at-end.yaml msgr-failures/few.yaml msgr/random.yaml objectstore/bluestore.yaml rados.yaml rocksdb.yaml thrashers/default.yaml thrashosds-health.yaml workloads/rados_api_tests.yaml}
- Priority changed from High to Urgent
/a/sage-2017-07-28_04:13:20-rados-wip-sage-testing-distro-basic-smithi/1455364
2017-07-28T05:32:11.900 INFO:tasks.workunit.client.0.smithi137.stdout: api_lock: [ RUN ] LibRadosLockEC.LockExclusiveDur
2017-07-28T05:32:11.900 INFO:tasks.workunit.client.0.smithi137.stdout: api_lock: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.1-795-g788745f/rpm/el7/BUILD/ceph-12.1.1-795-g788745f/src/test/
librados/lock.cc:224: Failure
2017-07-28T05:32:11.900 INFO:tasks.workunit.client.0.smithi137.stdout: api_lock: Expected: 0
2017-07-28T05:32:11.901 INFO:tasks.workunit.client.0.smithi137.stdout: api_lock: To be equal to: rados_lock_exclusive(ioctx, "foo", "TestLock", "Cookie", "", __null, 0)
2017-07-28T05:32:11.901 INFO:tasks.workunit.client.0.smithi137.stdout: api_lock: Which is: -17
2017-07-28T05:32:11.901 INFO:tasks.workunit.client.0.smithi137.stdout: api_lock: [ FAILED ] LibRadosLockEC.LockExclusiveDur (976 ms)
2017-08-03T19:04:32.120 INFO:tasks.workunit.client.0.smithi097.stdout:[ RUN ] ClsLock.TestLockDuration
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout:/build/ceph-12.1.2-97-g153d18c/src/test/cls_lock/test_cls_lock.cc:299: Failure
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout: Expected: 0
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout:To be equal to: l.lock_exclusive(&ioctx, oid)
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout: Which is: -17
2017-08-03T19:04:39.840 INFO:tasks.workunit.client.0.smithi097.stdout:[ FAILED ] ClsLock.TestLockDuration (7720 ms)
/a/sage-2017-08-03_18:16:40-rados-wip-sage-testing2-20170803-distro-basic-smithi/1479540
- Status changed from 12 to In Progress
- Assignee set to Neha Ojha
- Status changed from In Progress to Need More Info
Logs from the ClsLock unittest clearly show that there is a race in the test and it tries to take the lock again before the duration of the previous lock is over.
The first two set of logs from LibRadosLock also seem to indicate the same, but are hard to interpret since the lock names are same for all the tests. https://github.com/ceph/ceph/pull/16953 has been created to solve this problem.
- Priority changed from Urgent to High
- Status changed from Need More Info to Can't reproduce
- Status changed from Can't reproduce to 12
2018-03-01T04:29:22.584 INFO:tasks.workunit.client.0.smithi109.stdout:/build/ceph-13.0.1-2367-gee670b8/src/test/cls_lock/test_cls_lock.cc:299: Failure
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout: Expected: 0
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout:To be equal to: l.lock_exclusive(&ioctx, oid)
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout: Which is: -17
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout:[ FAILED ] ClsLock.TestLockDuration (7033 ms)
/a/kchai-2018-03-01_03:47:54-rados-wip-kefu-testing-2018-03-01-1006-distro-basic-smithi/2237462
/kchai-2018-04-26_00:52:32-rados-wip-kefu-testing-2018-04-25-2253-distro-basic-smithi/2439501/
/a/sage-2018-07-27_22:50:28-rados-wip-sage-testing-2018-07-27-0744-distro-basic-smithi/2826326
/a/sage-2018-10-13_00:36:33-rados-wip-sage-testing-2018-10-12-1741-distro-basic-smithi/3133276
/a/sage-2018-10-29_01:11:58-rados-wip-sage-testing-2018-10-28-0943-distro-basic-smithi/3197984
/a/sage-2018-11-05_22:04:25-rados-wip-sage3-testing-2018-11-05-1406-distro-basic-smithi/3227352
/a/sage-2018-12-12_23:36:13-rados-wip-sage2-testing-2018-12-12-1435-distro-basic-smithi/3335654
I'm guessing this is teh same...
2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout: api_lock: [ RUN ] LibRadosLock.LockSharedDur
2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout: api_lock: /build/ceph-14.0.1-2130-gaed40e3/src/test/librados/lock.cc:50: Failure
2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout: api_lock: Expected: expected
2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout: api_lock: Which is: 0
2019-01-01T05:42:26.630 INFO:tasks.workunit.client.0.smithi110.stdout: api_lock: To be equal to: wait_until(1.0s, 0.1s, expected, lock_shared, nullptr)
2019-01-01T05:42:26.630 INFO:tasks.workunit.client.0.smithi110.stdout: api_lock: Which is: -17
2019-01-01T05:42:26.630 INFO:tasks.workunit.client.0.smithi110.stdout: api_lock: [ FAILED ] LibRadosLock.LockSharedDur (827 ms)
/a/sage-2019-01-01_04:27:00-rados-wip-sage-testing-2018-12-31-1546-distro-basic-smithi/3410786
/a/kchai-2019-02-03_02:07:02-rados-wip-kefu2-testing-2019-02-03-0001-distro-basic-smithi/3543664/
/a/sage-2019-02-19_23:03:51-rados-wip-sage3-testing-2019-02-19-1008-distro-basic-smithi/3614261
- Status changed from 12 to Can't reproduce
Also available in: Atom
PDF