Failure in TestClusterFull.test_barrier
mount_a is getting the new OSD map unexpectedly soon.
It's hitting the barrier before trying to explicitly do any metadata operations afterwards, because it's getting a cap revoke from the MDS when mount_b does its metadata operations.
Which is weird, because mount_b was already the last guy do to any metadata ops before this point, so he should already have had all the needed caps.
tasks/cephfs: fix race in test_full
Sometimes mount A would get a cap revoke when mount
B did its last IO, resulting in mount A's OSD epoch
getting updated too.
Fix by making sure mount B is the last one to have
done IO before we do the barrier, so that when
it does IO again after the barrier, mount A can't
be holding any caps that B would need.
#1 Updated by Greg Farnum over 7 years ago
(Referring to ceph-qa-suite/tasks/cephfs/test_full.py::test_barrier().)
So mount.a is doing open_no_data("alpha") and that looks to me to be the last thing either client does before the 30 second sleep. Then mount.b touches/opens "bravo", and a here revoke against mount.a definitely looks possible to me?