Project

General

Profile

Actions

Bug #17894

closed

Filesystem removals intermittently failing in qa-suite

Added by John Spray over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Category:
Testing
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/teuthology-2016-11-12_17:15:01-fs-master---basic-smithi/543466/

I suspect this is a bug in the implementation of MDSCluster.delete_all_filesystems -- it is taking the mdsmap before setting "cluster down", so potentially another MDS became active in the interim, and that MDS would not have been failed. This code should take a fresh copy of the mdsmap after setting cluster down.

Actions #1

Updated by John Spray over 7 years ago

Hmm, too similar to be a coincidence?

http://qa-proxy.ceph.com/teuthology/jspray-2016-11-15_13:27:33-fs-wip-jcsp-testing-20161115-distro-basic-smithi/550536/teuthology.log

I'm wondering if something subtle changed in the recent mdsthrasher/Filesystem/MDSCluster changes.

Actions #2

Updated by John Spray over 7 years ago

  • Subject changed from Failure in TestMultiFilesystems.test_grow_shrink to Filesystem removals intermittently failing in qa-suite
Actions #3

Updated by Patrick Donnelly over 7 years ago

  • Assignee set to Patrick Donnelly

I'll look at this.

Actions #4

Updated by Patrick Donnelly over 7 years ago

I think your analysis is correct John. I'll write up a fix for that.

Actions #6

Updated by Patrick Donnelly over 7 years ago

  • Status changed from New to Fix Under Review
Actions #7

Updated by Patrick Donnelly over 7 years ago

I'll do a run of fs:multifs to see if the bug looks resolved.

Actions #8

Updated by Patrick Donnelly over 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF