Project

General

Profile

Bug #11314

qa: MDS crashed and the runs hung without ever timing out

Added by Greg Farnum over 4 years ago. Updated 2 months ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
Testing
Target version:
-
Start date:
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa
Pull request ID:

Description

http://pulpito.ceph.com/gregf-2015-04-01_20:53:07-fs-greg-fs-testing---basic-multi/

The MDS crashed in many of these, but for some reason the tasks didn't notice and kept running.

Anyway, we need to make it notice in cases like this so that the failures don't lock up the whole lab overnight and prevent any tests from running. :(


Related issues

Duplicates fs - Feature #10369: qa-suite: detect unexpected MDS failovers and daemon crashes Need Review

History

#1 Updated by John Spray over 4 years ago

This was mentioned in #10369:

Once we have that checker method, we should also introduce a check that all the daemons are really running, so that if e.g. an OSD crashes unexpectedly, we detect it immediately rather than waiting a long time for at timeout of some kind to end the test.

#2 Updated by Greg Farnum over 3 years ago

  • Subject changed from teuthology: MDS crashed and the runs hung without ever timing out to qa: MDS crashed and the runs hung without ever timing out
  • Priority changed from High to Normal

We clearly aren't treating this as very important, and I think we've had more trouble with OSDs doing this than MDSes lately, so whatever.

#3 Updated by Patrick Donnelly about 2 years ago

  • Assignee set to Patrick Donnelly

I added a DaemonWatchdog in the mds_thrash.py code that catches this kind of thing. We should pull it out into its own context so it can be used elsewhere.

#4 Updated by Patrick Donnelly about 2 years ago

  • Status changed from New to In Progress

#5 Updated by Patrick Donnelly over 1 year ago

  • Target version set to v14.0.0
  • Tags set to qa
  • Backport set to luminous,mimic
  • Component(FS) qa-suite added

#6 Updated by Patrick Donnelly 5 months ago

  • Target version changed from v14.0.0 to v15.0.0

#7 Updated by Patrick Donnelly 5 months ago

  • Status changed from In Progress to New
  • Assignee changed from Patrick Donnelly to Jos Collin
  • Priority changed from Normal to High
  • Target version deleted (v15.0.0)
  • Start date deleted (04/02/2015)
  • Tags deleted (qa)
  • Backport deleted (luminous,mimic)
  • Labels (FS) qa added

#8 Updated by Jos Collin 4 months ago

  • Status changed from New to In Progress

#9 Updated by Jos Collin 2 months ago

  • Pull request ID set to 28378

#10 Updated by Patrick Donnelly 2 months ago

  • Status changed from In Progress to Duplicate
  • Assignee deleted (Jos Collin)

#11 Updated by Patrick Donnelly 2 months ago

  • Related to deleted (Feature #10369: qa-suite: detect unexpected MDS failovers and daemon crashes)

#12 Updated by Patrick Donnelly 2 months ago

  • Duplicates Feature #10369: qa-suite: detect unexpected MDS failovers and daemon crashes added

Also available in: Atom PDF