Project

General

Profile

Bug #17894

Filesystem removals intermittently failing in qa-suite

Added by John Spray 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Category:
Testing
Target version:
-
Start date:
11/14/2016
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Component(FS):
qa-suite
Needs Doc:
No

Description

http://pulpito.ceph.com/teuthology-2016-11-12_17:15:01-fs-master---basic-smithi/543466/

I suspect this is a bug in the implementation of MDSCluster.delete_all_filesystems -- it is taking the mdsmap before setting "cluster down", so potentially another MDS became active in the interim, and that MDS would not have been failed. This code should take a fresh copy of the mdsmap after setting cluster down.

History

#1 Updated by John Spray 3 months ago

Hmm, too similar to be a coincidence?

http://qa-proxy.ceph.com/teuthology/jspray-2016-11-15_13:27:33-fs-wip-jcsp-testing-20161115-distro-basic-smithi/550536/teuthology.log

I'm wondering if something subtle changed in the recent mdsthrasher/Filesystem/MDSCluster changes.

#2 Updated by John Spray 3 months ago

  • Subject changed from Failure in TestMultiFilesystems.test_grow_shrink to Filesystem removals intermittently failing in qa-suite

#3 Updated by Patrick Donnelly 3 months ago

  • Assignee set to Patrick Donnelly

I'll look at this.

#4 Updated by Patrick Donnelly 3 months ago

I think your analysis is correct John. I'll write up a fix for that.

#6 Updated by Patrick Donnelly 3 months ago

  • Status changed from New to Need Review

#7 Updated by Patrick Donnelly 3 months ago

I'll do a run of fs:multifs to see if the bug looks resolved.

#8 Updated by Patrick Donnelly 3 months ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF