Project

General

Profile

Bug #13364

LibCephFS locking tests are failing and/or lockdep asserting

Added by Yuri Weinstein over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
smoke
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://qa-proxy.ceph.com/teuthology/teuthology-2015-10-04_05:00:08-smoke-master-distro-basic-multi
Job: 1088065
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-10-04_05:00:08-smoke-master-distro-basic-multi/1088065/teuthology.log

2015-10-04T06:11:27.517 INFO:tasks.workunit.client.0.plana20.stdout:Value of: status
2015-10-04T06:11:27.517 INFO:tasks.workunit.client.0.plana20.stdout:  Actual: 139
2015-10-04T06:11:27.518 INFO:tasks.workunit.client.0.plana20.stdout:Expected: 0
2015-10-04T06:11:27.518 INFO:tasks.workunit.client.0.plana20.stdout:[  FAILED  ] LibCephFS.InterProcessLocking (269 ms)
2015-10-04T06:11:27.518 INFO:tasks.workunit.client.0.plana20.stdout:[ RUN      ] LibCephFS.ThreesomeInterProcessLocking
2015-10-04T06:11:27.867 INFO:tasks.workunit.client.0.plana20.stdout:test/libcephfs/flock.cc:628: Failure
2015-10-04T06:11:27.867 INFO:tasks.workunit.client.0.plana20.stdout:Value of: status
2015-10-04T06:11:27.868 INFO:tasks.workunit.client.0.plana20.stdout:  Actual: 139
2015-10-04T06:11:27.868 INFO:tasks.workunit.client.0.plana20.stdout:Expected: 0
2015-10-04T06:11:27.868 INFO:tasks.workunit.client.0.plana20.stdout:[  FAILED  ] LibCephFS.ThreesomeInterProcessLocking (350 ms)
2015-10-04T06:11:27.869 INFO:tasks.workunit.client.0.plana20.stdout:[----------] 43 tests from LibCephFS (103928 ms total)
2015-10-04T06:11:27.869 INFO:tasks.workunit.client.0.plana20.stdout:
2015-10-04T06:11:27.869 INFO:tasks.workunit.client.0.plana20.stdout:[----------] 1 test from Caps
2015-10-04T06:11:27.870 INFO:tasks.workunit.client.0.plana20.stdout:[ RUN      ] Caps.ReadZero
2015-10-04T06:11:32.252 INFO:tasks.workunit.client.0.plana20.stdout:[       OK ] Caps.ReadZero (4385 ms)
2015-10-04T06:11:32.252 INFO:tasks.workunit.client.0.plana20.stdout:[----------] 1 test from Caps (4385 ms total)
2015-10-04T06:11:32.252 INFO:tasks.workunit.client.0.plana20.stdout:
2015-10-04T06:11:32.252 INFO:tasks.workunit.client.0.plana20.stdout:[----------] 4 tests from AccessTest
2015-10-04T06:11:32.253 INFO:tasks.workunit.client.0.plana20.stdout:[ RUN      ] AccessTest.Foo
2015-10-04T06:11:32.278 INFO:tasks.workunit.client.0.plana20.stdout:out: [{"entity":"client.libcephfs_foo_test.1909293361","key":"AQCEJRFWsL8bEBAAQ7NPaIeFgwqeeJtOPI+aIg==","caps":{"mds":"allow rw","mon":"allow *","osd":"allow rw"}}]
2015-10-04T06:11:32.278 INFO:tasks.workunit.client.0.plana20.stdout:key: AQCEJRFWsL8bEBAAQ7NPaIeFgwqeeJtOPI+aIg==
2015-10-04T06:11:37.249 INFO:tasks.workunit.client.0.plana20.stdout:[       OK ] AccessTest.Foo (4997 ms)
2015-10-04T06:11:37.250 INFO:tasks.workunit.client.0.plana20.stdout:[ RUN      ] AccessTest.Path
2015-10-04T06:11:37.288 INFO:tasks.workunit.client.0.plana20.stdout:out: [{"entity":"client.libcephfs_path_test.437206246","key":"AQCJJRFWZy3AEBAAf4BIJUCd5yBOVsfnm2OfdQ==","caps":{"mds":"allow r, allow rw path=\/ceph_test_libcephfs.1699450276","mon":"allow r","osd":"allow rwx"}}]
2015-10-04T06:11:37.288 INFO:tasks.workunit.client.0.plana20.stdout:key: AQCJJRFWZy3AEBAAf4BIJUCd5yBOVsfnm2OfdQ==
2015-10-04T06:11:37.343 INFO:tasks.workunit.client.0.plana20.stdout:test/libcephfs/access.cc:173: Failure
2015-10-04T06:11:37.343 INFO:tasks.workunit.client.0.plana20.stdout:Expected: (ceph_ftruncate(cmount, fd, 0)) >= (0), actual: -13 vs 0
2015-10-04T06:11:37.343 INFO:tasks.workunit.client.0.plana20.stdout:[  FAILED  ] AccessTest.Path (94 ms)
2015-10-04T06:11:37.343 INFO:tasks.workunit.client.0.plana20.stdout:[ RUN      ] AccessTest.ReadOnly
2015-10-04T06:11:37.373 INFO:tasks.workunit.client.0.plana20.stdout:out: [{"entity":"client.libcephfs_readonly_test.119217342","key":"AQCJJRFWEE3aFRAA\/QoZTOI1tepzMZIDsGdXbw==","caps":{"mds":"allow r","mon":"allow r","osd":"allow rw"}}]
2015-10-04T06:11:37.373 INFO:tasks.workunit.client.0.plana20.stdout:key: AQCJJRFWEE3aFRAA/QoZTOI1tepzMZIDsGdXbw==
2015-10-04T06:11:42.250 INFO:tasks.workunit.client.0.plana20.stdout:[       OK ] AccessTest.ReadOnly (4907 ms)
2015-10-04T06:11:42.250 INFO:tasks.workunit.client.0.plana20.stdout:[ RUN      ] AccessTest.User
2015-10-04T06:11:42.275 INFO:tasks.workunit.client.0.plana20.stdout:out: [{"entity":"client.libcephfs_user_test.1966880974","key":"AQCOJRFWseP2DxAA2w+x5kqT30z6ogb6ekuRTg==","caps":{"mds":"allow rw uid=123 gids=456,789","mon":"allow *","osd":"allow rw"}}]
2015-10-04T06:11:42.275 INFO:tasks.workunit.client.0.plana20.stdout:key: AQCOJRFWseP2DxAA2w+x5kqT30z6ogb6ekuRTg==
2015-10-04T06:11:49.498 INFO:teuthology.orchestra.run.burnupi14:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2015-10-04T06:11:49.501 INFO:teuthology.orchestra.run.mira041:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2015-10-04T06:11:49.504 INFO:teuthology.orchestra.run.plana20:Running: 'sudo logrotate /etc/logrotate.d/ceph-test.conf'
2015-10-04T06:11:52.250 INFO:tasks.workunit.client.0.plana20.stdout:[       OK ] AccessTest.User (10000 ms)
2015-10-04T06:11:52.250 INFO:tasks.workunit.client.0.plana20.stdout:[----------] 4 tests from AccessTest (19998 ms total)
2015-10-04T06:11:52.250 INFO:tasks.workunit.client.0.plana20.stdout:
2015-10-04T06:11:52.250 INFO:tasks.workunit.client.0.plana20.stdout:[----------] Global test environment tear-down
2015-10-04T06:11:52.250 INFO:tasks.workunit.client.0.plana20.stdout:[==========] 48 tests from 3 test cases ran. (128311 ms total)
2015-10-04T06:11:52.250 INFO:tasks.workunit.client.0.plana20.stdout:[  PASSED  ] 45 tests.
2015-10-04T06:11:52.250 INFO:tasks.workunit.client.0.plana20.stdout:[  FAILED  ] 3 tests, listed below:
2015-10-04T06:11:52.251 INFO:tasks.workunit.client.0.plana20.stdout:[  FAILED  ] LibCephFS.InterProcessLocking
2015-10-04T06:11:52.251 INFO:tasks.workunit.client.0.plana20.stdout:[  FAILED  ] LibCephFS.ThreesomeInterProcessLocking
2015-10-04T06:11:52.251 INFO:tasks.workunit.client.0.plana20.stdout:[  FAILED  ] AccessTest.Path
2015-10-04T06:11:52.251 INFO:tasks.workunit.client.0.plana20.stdout:
2015-10-04T06:11:52.251 INFO:tasks.workunit.client.0.plana20.stdout: 3 FAILED TESTS

Associated revisions

Revision 1e57e6de (diff)
Added by Sage Weil over 8 years ago

mds/Session: use projected parent for auth path check

In particular, a file's current path might indicate no parent
because the initial link into the tree isn't committed yet.

Fixes: #13364
Signed-off-by: Sage Weil <>

History

#1 Updated by Yuri Weinstein over 8 years ago

  • Subject changed from "LibCephFS.*" 3 test failed in smoke-master-distro-basic-multi to "LibCephFS.*" tests failed in smoke-master-distro-basic-multi

#2 Updated by Sage Weil over 8 years ago

  • Priority changed from Normal to Urgent

#3 Updated by Sage Weil over 8 years ago

  • Status changed from New to 12
  • Assignee set to Sage Weil

#5 Updated by Sage Weil over 8 years ago

https://github.com/ceph/ceph/pull/6200

fixes the path check.

These still fail:

[ RUN ] LibCephFS.InterProcessLocking
test/libcephfs/flock.cc:392: Failure
Value of: ceph_mount(cmount, __null)
Actual: -22
Expected: 0
test/libcephfs/flock.cc:465: Failure
Value of: sem_timedwait(&s.sem[1%2], abstime(ts, waitSlowMs))
Actual: -1
Expected: 0
[ FAILED ] LibCephFS.InterProcessLocking (5108 ms)
2015-10-07 21:15:36,246.246 INFO:tasks.workunit.client.0.plana91.stdout:[ RUN ] LibCephFS.ThreesomeInterProcessLocking
2015-10-07 21:15:36,708.708 INFO:tasks.workunit.client.0.plana91.stdout:test/libcephfs/flock.cc:628: Failure
2015-10-07 21:15:36,709.709 INFO:tasks.workunit.client.0.plana91.stdout:Value of: status
2015-10-07 21:15:36,709.709 INFO:tasks.workunit.client.0.plana91.stdout: Actual: 139
2015-10-07 21:15:36,709.709 INFO:tasks.workunit.client.0.plana91.stdout:Expected: 0
2015-10-07 21:15:36,710.710 INFO:tasks.workunit.client.0.plana91.stdout:[ FAILED ] LibCephFS.ThreesomeInterProcessLocking (464 ms)

but i'm not sure why...

#6 Updated by Greg Farnum over 8 years ago

  • Project changed from Ceph to CephFS
  • Assignee changed from Sage Weil to Greg Farnum

Sage says this passes when run against a vstart cluster.

#7 Updated by Greg Farnum over 8 years ago

If somebody else wants to investigate this, feel free, but otherwise I'll try and get to it.

#8 Updated by Greg Farnum over 8 years ago

  • Status changed from 12 to In Progress

Looking at this now, it's weird.

#9 Updated by Greg Farnum over 8 years ago

  • Subject changed from "LibCephFS.*" tests failed in smoke-master-distro-basic-multi to LibCephFS locking tests are failing and/or lockdep asserting
  • Status changed from In Progress to 12
  • Assignee deleted (Greg Farnum)

http://pulpito.ceph.com/teuthology-2015-11-09_23:04:02-fs-master---basic-multi/1144848/

We've been hitting lockdep asserts on these tests too; here's one of them ^

#10 Updated by Zheng Yan over 8 years ago

Greg Farnum wrote:

http://pulpito.ceph.com/teuthology-2015-11-09_23:04:02-fs-master---basic-multi/1144848/

We've been hitting lockdep asserts on these tests too; here's one of them ^

still is side effect of previous failure (missing . in readdir result)

#11 Updated by Greg Farnum over 8 years ago

Zheng thinks this is two bugs:
1) Missing "." in list results (fixed in waiting PR)
2) if MDS fragments directory, we "restart" listing and that breaks the test by resetting the dir pointer.

#12 Updated by Zheng Yan over 8 years ago

  • Status changed from 12 to Fix Under Review

add a commit to fix the readdir test case to https://github.com/ceph/ceph/pull/6260

#13 Updated by Greg Farnum about 8 years ago

  • Status changed from Fix Under Review to Resolved

Merged to jewel branch as of 088b8c2c881c8561b9bc96934e04d030e8de9255.
I created a new feature ticket for the directory resetting thing: #14271.

Also available in: Atom PDF