Filesystem removals intermittently failing in qa-suite
I suspect this is a bug in the implementation of MDSCluster.delete_all_filesystems -- it is taking the mdsmap before setting "cluster down", so potentially another MDS became active in the interim, and that MDS would not have been failed. This code should take a fresh copy of the mdsmap after setting cluster down.
#1 Updated by John Spray 2 months ago
Hmm, too similar to be a coincidence?
I'm wondering if something subtle changed in the recent mdsthrasher/Filesystem/MDSCluster changes.