Project

General

Profile

Bug #20798

LibRadosLockECPP.LockExclusiveDurPP gets EEXIST

Added by Sage Weil over 5 years ago. Updated over 3 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock: [ RUN      ] LibRadosLockECPP.LockExclusiveDurPP
2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock: /build/ceph-12.1.1-715-gc75dd25/src/test/librados/lock.cc:233: Failure
2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock:       Expected: 0
2017-07-27T16:02:26.095 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock: To be equal to: ioctx.lock_exclusive("foo", "TestLock", "Cookie", "", __null, 0)
2017-07-27T16:02:26.096 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock:       Which is: -17
2017-07-27T16:02:26.097 INFO:tasks.workunit.client.0.smithi179.stdout:                 api_lock: [  FAILED  ] LibRadosLockECPP.LockExclusiveDurPP (1009 ms)

/a/sage-2017-07-27_13:38:00-rados-wip-sage-testing-distro-basic-smithi/1453171

note this is short_pg_log..

rados/thrash/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml backoff/peering.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-require-luminous/at-end.yaml msgr-failures/few.yaml msgr/random.yaml objectstore/bluestore.yaml rados.yaml rocksdb.yaml thrashers/default.yaml thrashosds-health.yaml workloads/rados_api_tests.yaml}

History

#1 Updated by Sage Weil over 5 years ago

  • Priority changed from High to Urgent

/a/sage-2017-07-28_04:13:20-rados-wip-sage-testing-distro-basic-smithi/1455364

2017-07-28T05:32:11.900 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock: [ RUN      ] LibRadosLockEC.LockExclusiveDur
2017-07-28T05:32:11.900 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.1-795-g788745f/rpm/el7/BUILD/ceph-12.1.1-795-g788745f/src/test/
librados/lock.cc:224: Failure
2017-07-28T05:32:11.900 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock:       Expected: 0
2017-07-28T05:32:11.901 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock: To be equal to: rados_lock_exclusive(ioctx, "foo", "TestLock", "Cookie", "", __null, 0)
2017-07-28T05:32:11.901 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock:       Which is: -17
2017-07-28T05:32:11.901 INFO:tasks.workunit.client.0.smithi137.stdout:                 api_lock: [  FAILED  ] LibRadosLockEC.LockExclusiveDur (976 ms)

#2 Updated by Sage Weil over 5 years ago

2017-08-03T19:04:32.120 INFO:tasks.workunit.client.0.smithi097.stdout:[ RUN      ] ClsLock.TestLockDuration
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout:/build/ceph-12.1.2-97-g153d18c/src/test/cls_lock/test_cls_lock.cc:299: Failure
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout:      Expected: 0
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout:To be equal to: l.lock_exclusive(&ioctx, oid)
2017-08-03T19:04:39.837 INFO:tasks.workunit.client.0.smithi097.stdout:      Which is: -17
2017-08-03T19:04:39.840 INFO:tasks.workunit.client.0.smithi097.stdout:[  FAILED  ] ClsLock.TestLockDuration (7720 ms)

/a/sage-2017-08-03_18:16:40-rados-wip-sage-testing2-20170803-distro-basic-smithi/1479540

#3 Updated by Neha Ojha over 5 years ago

  • Status changed from 12 to In Progress
  • Assignee set to Neha Ojha

#4 Updated by Neha Ojha over 5 years ago

  • Status changed from In Progress to Need More Info

Logs from the ClsLock unittest clearly show that there is a race in the test and it tries to take the lock again before the duration of the previous lock is over.

The first two set of logs from LibRadosLock also seem to indicate the same, but are hard to interpret since the lock names are same for all the tests. https://github.com/ceph/ceph/pull/16953 has been created to solve this problem.

#5 Updated by Sage Weil over 5 years ago

  • Priority changed from Urgent to High

#6 Updated by Sage Weil about 5 years ago

  • Status changed from Need More Info to Can't reproduce

#7 Updated by Kefu Chai almost 5 years ago

  • Status changed from Can't reproduce to 12
2018-03-01T04:29:22.584 INFO:tasks.workunit.client.0.smithi109.stdout:/build/ceph-13.0.1-2367-gee670b8/src/test/cls_lock/test_cls_lock.cc:299: Failure
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout:      Expected: 0
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout:To be equal to: l.lock_exclusive(&ioctx, oid)
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout:      Which is: -17
2018-03-01T04:29:22.585 INFO:tasks.workunit.client.0.smithi109.stdout:[  FAILED  ] ClsLock.TestLockDuration (7033 ms)

/a/kchai-2018-03-01_03:47:54-rados-wip-kefu-testing-2018-03-01-1006-distro-basic-smithi/2237462

#8 Updated by Kefu Chai over 4 years ago

/kchai-2018-04-26_00:52:32-rados-wip-kefu-testing-2018-04-25-2253-distro-basic-smithi/2439501/

#9 Updated by Sage Weil over 4 years ago

/a/sage-2018-07-27_22:50:28-rados-wip-sage-testing-2018-07-27-0744-distro-basic-smithi/2826326

#10 Updated by Sage Weil about 4 years ago

/a/sage-2018-10-13_00:36:33-rados-wip-sage-testing-2018-10-12-1741-distro-basic-smithi/3133276

#11 Updated by Sage Weil about 4 years ago

/a/sage-2018-10-29_01:11:58-rados-wip-sage-testing-2018-10-28-0943-distro-basic-smithi/3197984

#12 Updated by Sage Weil about 4 years ago

/a/sage-2018-11-05_22:04:25-rados-wip-sage3-testing-2018-11-05-1406-distro-basic-smithi/3227352

#13 Updated by Sage Weil almost 4 years ago

/a/sage-2018-12-12_23:36:13-rados-wip-sage2-testing-2018-12-12-1435-distro-basic-smithi/3335654

#14 Updated by Sage Weil almost 4 years ago

I'm guessing this is teh same...

2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock: [ RUN      ] LibRadosLock.LockSharedDur
2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock: /build/ceph-14.0.1-2130-gaed40e3/src/test/librados/lock.cc:50: Failure
2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock:       Expected: expected
2019-01-01T05:42:26.629 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock:       Which is: 0
2019-01-01T05:42:26.630 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock: To be equal to: wait_until(1.0s, 0.1s, expected, lock_shared, nullptr)
2019-01-01T05:42:26.630 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock:       Which is: -17
2019-01-01T05:42:26.630 INFO:tasks.workunit.client.0.smithi110.stdout:                 api_lock: [  FAILED  ] LibRadosLock.LockSharedDur (827 ms)

/a/sage-2019-01-01_04:27:00-rados-wip-sage-testing-2018-12-31-1546-distro-basic-smithi/3410786

#15 Updated by Kefu Chai almost 4 years ago

/a/kchai-2019-02-03_02:07:02-rados-wip-kefu2-testing-2019-02-03-0001-distro-basic-smithi/3543664/

#16 Updated by Sage Weil almost 4 years ago

/a/sage-2019-02-19_23:03:51-rados-wip-sage3-testing-2019-02-19-1008-distro-basic-smithi/3614261

#17 Updated by Josh Durgin over 3 years ago

  • Status changed from 12 to Can't reproduce

Also available in: Atom PDF