Project

General

Profile

Bug #11913

Failure in TestClusterFull.test_barrier

Added by John Spray about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Testing
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature:

Description

http://pulpito.ceph.com/teuthology-2015-06-05_23:04:02-fs-master---basic-multi/922578/

mount_a is getting the new OSD map unexpectedly soon.

It's hitting the barrier before trying to explicitly do any metadata operations afterwards, because it's getting a cap revoke from the MDS when mount_b does its metadata operations.

Which is weird, because mount_b was already the last guy do to any metadata ops before this point, so he should already have had all the needed caps.

Associated revisions

Revision 63a563d0 (diff)
Added by John Spray about 5 years ago

tasks/cephfs: fix race in test_full

Sometimes mount A would get a cap revoke when mount
B did its last IO, resulting in mount A's OSD epoch
getting updated too.

Fix by making sure mount B is the last one to have
done IO before we do the barrier, so that when
it does IO again after the barrier, mount A can't
be holding any caps that B would need.

Fixes: #11913
Signed-off-by: John Spray <>

History

#1 Updated by Greg Farnum about 5 years ago

(Referring to ceph-qa-suite/tasks/cephfs/test_full.py::test_barrier().)

So mount.a is doing open_no_data("alpha") and that looks to me to be the last thing either client does before the 30 second sleep. Then mount.b touches/opens "bravo", and a here revoke against mount.a definitely looks possible to me?

#2 Updated by John Spray about 5 years ago

  • Status changed from New to Fix Under Review

#3 Updated by Greg Farnum about 5 years ago

  • Status changed from Fix Under Review to Resolved

commit:bf9a9a2d9ff2be129b303d535899f60ad49f7c23

Also available in: Atom PDF