Project

General

Profile

Bug #11314

qa: MDS crashed and the runs hung without ever timing out

Added by Greg Farnum over 3 years ago. Updated 8 months ago.

Status:
In Progress
Priority:
Normal
Category:
Testing
Target version:
Start date:
04/02/2015
Due date:
% Done:

0%

Source:
Q/A
Tags:
qa
Backport:
luminous,mimic
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
Pull request ID:

Description

http://pulpito.ceph.com/gregf-2015-04-01_20:53:07-fs-greg-fs-testing---basic-multi/

The MDS crashed in many of these, but for some reason the tasks didn't notice and kept running.

Anyway, we need to make it notice in cases like this so that the failures don't lock up the whole lab overnight and prevent any tests from running. :(


Related issues

Related to fs - Feature #10369: qa-suite: detect unexpected MDS failovers and daemon crashes New 12/18/2014

History

#1 Updated by John Spray over 3 years ago

This was mentioned in #10369:

Once we have that checker method, we should also introduce a check that all the daemons are really running, so that if e.g. an OSD crashes unexpectedly, we detect it immediately rather than waiting a long time for at timeout of some kind to end the test.

#2 Updated by Greg Farnum over 2 years ago

  • Subject changed from teuthology: MDS crashed and the runs hung without ever timing out to qa: MDS crashed and the runs hung without ever timing out
  • Priority changed from High to Normal

We clearly aren't treating this as very important, and I think we've had more trouble with OSDs doing this than MDSes lately, so whatever.

#3 Updated by Patrick Donnelly over 1 year ago

  • Assignee set to Patrick Donnelly

I added a DaemonWatchdog in the mds_thrash.py code that catches this kind of thing. We should pull it out into its own context so it can be used elsewhere.

#4 Updated by Patrick Donnelly over 1 year ago

  • Status changed from New to In Progress

#5 Updated by Patrick Donnelly 8 months ago

  • Target version set to v14.0.0
  • Tags set to qa
  • Backport set to luminous,mimic
  • Component(FS) qa-suite added

Also available in: Atom PDF