Project

General

Profile

Bug #20798

LibRadosLockECPP.LockExclusiveDurPP gets EEXIST

Added by Sage Weil almost 2 years ago. Updated 5 months ago.

Status:
Verified
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
07/27/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:

Description

2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock: [ RUN      ] LibRadosLockECPP.LockExclusiveDurPP
2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock: /build/ceph-12.1.1-715-gc75dd25/src/test/librados/lock.cc:233: Failure
2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock:       Expected: 0
2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock: To be equal to: ioctx.lock_exclusive("foo", "TestLock", "Cookie", "", __null, 0)
2017-07-27T16:02:26.096 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock:       Which is: -17
2017-07-27T16:02:26.097 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock: [  FAILED  ] LibRadosLockECPP.LockExclusiveDurPP (1009 ms)

/a/sage-2017-07-27_13:38:00-rados-wip-sage-testing-distro-basic-smithi/1453171

note this is short_pg_log..

rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml backoff/peering.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-require-luminous/at-end.yaml msgr-failures/few.yaml msgr/random.yaml objectstore/bluestore.yaml rados.yaml rocksdb.yaml thrashers/default.yaml thrashosds-health.yaml workloads/rados_api_tests.yaml}

History

#1 Updated by Sage Weil almost 2 years ago

  • Priority changed from High to Urgent

/a/sage-2017-07-28_04:13:20-rados-wip-sage-testing-distro-basic-smithi/1455364

2017-07-28T05:32:11.900 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock: [ RUN      ] LibRadosLockEC.LockExclusiveDur
2017-07-28T05:32:11.900 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.1-795-g788745f/rpm/el7/BUILD/ceph-12.1.1-795-g788745f/src/test/
librados/lock.cc:224: Failure
2017-07-28T05:32:11.900 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock:       Expected: 0
2017-07-28T05:32:11.901 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock: To be equal to: rados_lock_exclusive(ioctx, "foo", "TestLock", "Cookie", "", __null, 0)
2017-07-28T05:32:11.901 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock:       Which is: -17
2017-07-28T05:32:11.901 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock: [  FAILED  ] LibRadosLockEC.LockExclusiveDur (976 ms)

#2 Updated by Sage Weil almost 2 years ago

2017-08-03T19:04:32.120 INFO:tasks.workunit.client.0.smithi097.stdout:[ RUN      ] ClsLock.TestLockDuration
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout:/build/ceph-12.1.2-97-g153d18c/src/test/cls_lock/test_cls_lock.cc:299: Failure
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout:      Expected: 0
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout:To be equal to: l.lock_exclusive(&ioctx, oid)
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout:      Which is: -17
2017-08-03T19:04:39.840 INFO:tasks.workunit.client.0.smithi097.stdout:[  FAILED  ] ClsLock.TestLockDuration (7720 ms)

/a/sage-2017-08-03_18:16:40-rados-wip-sage-testing2-20170803-distro-basic-smithi/1479540

#3 Updated by Neha Ojha almost 2 years ago

  • Status changed from Verified to In Progress
  • Assignee set to Neha Ojha

#4 Updated by Neha Ojha almost 2 years ago

  • Status changed from In Progress to Need More Info

Logs from the ClsLock unittest clearly show that there is a race in the test and it tries to take the lock again before the duration of the previous lock is over.

The first two set of logs from LibRadosLock also seem to indicate the same, but are hard to interpret since the lock names are same for all the tests. https://github.com/ceph/ceph/pull/16953 has been created to solve this problem.

#5 Updated by Sage Weil almost 2 years ago

  • Priority changed from Urgent to High

#6 Updated by Sage Weil over 1 year ago

  • Status changed from Need More Info to Can't reproduce

#7 Updated by Kefu Chai over 1 year ago

  • Status changed from Can't reproduce to Verified
2018-03-01T04:29:22.584 INFO:tasks.workunit.client.0.smithi109.stdout:/build/ceph-13.0.1-2367-gee670b8/src/test/cls_lock/test_cls_lock.cc:299: Failure
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout:      Expected: 0
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout:To be equal to: l.lock_exclusive(&ioctx, oid)
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout:      Which is: -17
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout:[  FAILED  ] ClsLock.TestLockDuration (7033 ms)

/a/kchai-2018-03-01_03:47:54-rados-wip-kefu-testing-2018-03-01-1006-distro-basic-smithi/2237462

#8 Updated by Kefu Chai about 1 year ago

/kchai-2018-04-26_00:52:32-rados-wip-kefu-testing-2018-04-25-2253-distro-basic-smithi/2439501/

#9 Updated by Sage Weil 12 months ago

/a/sage-2018-07-27_22:50:28-rados-wip-sage-testing-2018-07-27-0744-distro-basic-smithi/2826326

#10 Updated by Sage Weil 9 months ago

/a/sage-2018-10-13_00:36:33-rados-wip-sage-testing-2018-10-12-1741-distro-basic-smithi/3133276

#11 Updated by Sage Weil 9 months ago

/a/sage-2018-10-29_01:11:58-rados-wip-sage-testing-2018-10-28-0943-distro-basic-smithi/3197984

#12 Updated by Sage Weil 9 months ago

/a/sage-2018-11-05_22:04:25-rados-wip-sage3-testing-2018-11-05-1406-distro-basic-smithi/3227352

#13 Updated by Sage Weil 7 months ago

/a/sage-2018-12-12_23:36:13-rados-wip-sage2-testing-2018-12-12-1435-distro-basic-smithi/3335654

#14 Updated by Sage Weil 7 months ago

I'm guessing this is teh same...

2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock: [ RUN      ] LibRadosLock.LockSharedDur
2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock: /build/ceph-14.0.1-2130-gaed40e3/src/test/librados/lock.cc:50: Failure
2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock:       Expected: expected
2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock:       Which is: 0
2019-01-01T05:42:26.630 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock: To be equal to: wait_until(1.0s, 0.1s, expected, lock_shared, nullptr)
2019-01-01T05:42:26.630 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock:       Which is: -17
2019-01-01T05:42:26.630 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock: [  FAILED  ] LibRadosLock.LockSharedDur (827 ms)

/a/sage-2019-01-01_04:27:00-rados-wip-sage-testing-2018-12-31-1546-distro-basic-smithi/3410786

#15 Updated by Kefu Chai 6 months ago

/a/kchai-2019-02-03_02:07:02-rados-wip-kefu2-testing-2019-02-03-0001-distro-basic-smithi/3543664/

#16 Updated by Sage Weil 5 months ago

/a/sage-2019-02-19_23:03:51-rados-wip-sage3-testing-2019-02-19-1008-distro-basic-smithi/3614261

Also available in: Atom PDF