Bug #17894: Filesystem removals intermittently failing in qa-suite - CephFS - Ceph

Actions

Bug #17894

closed

Filesystem removals intermittently failing in qa-suite

Added by John Spray over 7 years ago. Updated over 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Patrick Donnelly

Category:

Testing

Target version:

-

% Done:

0%

Source:

Tags:

Backport:

Regression:

No

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

qa-suite

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

http://pulpito.ceph.com/teuthology-2016-11-12_17:15:01-fs-master---basic-smithi/543466/

I suspect this is a bug in the implementation of MDSCluster.delete_all_filesystems -- it is taking the mdsmap before setting "cluster down", so potentially another MDS became active in the interim, and that MDS would not have been failed. This code should take a fresh copy of the mdsmap after setting cluster down.

Actions

#1

Updated by John Spray over 7 years ago

Hmm, too similar to be a coincidence?

http://qa-proxy.ceph.com/teuthology/jspray-2016-11-15_13:27:33-fs-wip-jcsp-testing-20161115-distro-basic-smithi/550536/teuthology.log

I'm wondering if something subtle changed in the recent mdsthrasher/Filesystem/MDSCluster changes.

Actions

#2

Updated by John Spray over 7 years ago

Subject changed from Failure in TestMultiFilesystems.test_grow_shrink to Filesystem removals intermittently failing in qa-suite

Actions

#3

Updated by Patrick Donnelly over 7 years ago

Assignee set to Patrick Donnelly

I'll look at this.

Actions

#4

Updated by Patrick Donnelly over 7 years ago

I think your analysis is correct John. I'll write up a fix for that.

Actions

#5

Updated by Patrick Donnelly over 7 years ago

PR: https://github.com/ceph/ceph-qa-suite/pull/1262

Actions

#6

Updated by Patrick Donnelly over 7 years ago

Status changed from New to Fix Under Review

Actions

#7

Updated by Patrick Donnelly over 7 years ago

I'll do a run of fs:multifs to see if the bug looks resolved.

Actions

#8

Updated by Patrick Donnelly over 7 years ago

Status changed from Fix Under Review to Resolved

Actions

Also available in: Atom PDF