Project

General

Profile

Feature #10369

qa-suite: detect unexpected MDS failovers and daemon crashes

Added by John Spray almost 4 years ago. Updated 8 months ago.

Status:
New
Priority:
High
Category:
Testing
Target version:
Start date:
12/18/2014
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
qa-suite
Labels (FS):
qa
Pull request ID:

Description

Currently some of our tests can be run with standby MDSs, and a failover event might occur without our tests noticing.

In the workunit-type tests, this is mainly something we want to ignore, because systems are often unpredictably slow, and as long as the filesystem satisfies the workunit we don't necessarily care if a failover happened.

In the new functional tests, we do want to know if a failover occurred unexpectedly, because some of the tests are poking individual MDSs in quite specific ways, and if a failover happened then we should stop the test rather than proceeding into some arcane unexpected failure mode. These tests also generally expose the system to less load, so the "it was just slow" failovers shouldn't often happen, and in any other unexpected failover cases we would like to stop the test to see what/why happened.

An out-of-thread ticker might be challenging as interrupting the main thread to inject a failure is not a normal thing to do, so maybe put it in the wait_until* helpers and as a pre-check to some Filesystem methods, so that we will always detect reasonably early when something went weird. Once we have that checker method, we should also introduce a check that all the daemons are really running, so that if e.g. an OSD crashes unexpectedly, we detect it immediately rather than waiting a long time for at timeout of some kind to end the test.


Related issues

Related to fs - Bug #11314: qa: MDS crashed and the runs hung without ever timing out In Progress 04/02/2015

History

#1 Updated by John Spray almost 4 years ago

  • Tracker changed from Fix to Feature

#2 Updated by Greg Farnum over 3 years ago

  • Priority changed from Normal to High

We just keep re-creating this feature: #12821

#3 Updated by John Spray over 3 years ago

  • Category set to Testing

#4 Updated by Patrick Donnelly 8 months ago

  • Assignee set to Patrick Donnelly
  • Target version set to v14.0.0
  • Source changed from other to Development
  • Component(FS) qa-suite added
  • Labels (FS) qa added

Also available in: Atom PDF