Project

General

Profile

Actions

Bug #11314

closed

qa: MDS crashed and the runs hung without ever timing out

Added by Greg Farnum about 9 years ago. Updated almost 5 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
Testing
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
qa
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/gregf-2015-04-01_20:53:07-fs-greg-fs-testing---basic-multi/

The MDS crashed in many of these, but for some reason the tasks didn't notice and kept running.

Anyway, we need to make it notice in cases like this so that the failures don't lock up the whole lab overnight and prevent any tests from running. :(


Related issues 1 (0 open1 closed)

Is duplicate of CephFS - Feature #10369: qa-suite: detect unexpected MDS failovers and daemon crashesResolvedJos Collin

Actions
Actions #1

Updated by John Spray about 9 years ago

This was mentioned in #10369:

Once we have that checker method, we should also introduce a check that all the daemons are really running, so that if e.g. an OSD crashes unexpectedly, we detect it immediately rather than waiting a long time for at timeout of some kind to end the test.

Actions #2

Updated by Greg Farnum about 8 years ago

  • Subject changed from teuthology: MDS crashed and the runs hung without ever timing out to qa: MDS crashed and the runs hung without ever timing out
  • Priority changed from High to Normal

We clearly aren't treating this as very important, and I think we've had more trouble with OSDs doing this than MDSes lately, so whatever.

Actions #3

Updated by Patrick Donnelly over 6 years ago

  • Assignee set to Patrick Donnelly

I added a DaemonWatchdog in the mds_thrash.py code that catches this kind of thing. We should pull it out into its own context so it can be used elsewhere.

Actions #4

Updated by Patrick Donnelly over 6 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Patrick Donnelly about 6 years ago

  • Target version set to v14.0.0
  • Tags set to qa
  • Backport set to luminous,mimic
  • Component(FS) qa-suite added
Actions #6

Updated by Patrick Donnelly about 5 years ago

  • Target version changed from v14.0.0 to v15.0.0
Actions #7

Updated by Patrick Donnelly about 5 years ago

  • Status changed from In Progress to New
  • Assignee changed from Patrick Donnelly to Jos Collin
  • Priority changed from Normal to High
  • Target version deleted (v15.0.0)
  • Start date deleted (04/02/2015)
  • Tags deleted (qa)
  • Backport deleted (luminous,mimic)
  • Labels (FS) qa added
Actions #8

Updated by Jos Collin about 5 years ago

  • Status changed from New to In Progress
Actions #9

Updated by Jos Collin almost 5 years ago

  • Pull request ID set to 28378
Actions #10

Updated by Patrick Donnelly almost 5 years ago

  • Status changed from In Progress to Duplicate
  • Assignee deleted (Jos Collin)
Actions #11

Updated by Patrick Donnelly almost 5 years ago

  • Related to deleted (Feature #10369: qa-suite: detect unexpected MDS failovers and daemon crashes)
Actions #12

Updated by Patrick Donnelly almost 5 years ago

  • Is duplicate of Feature #10369: qa-suite: detect unexpected MDS failovers and daemon crashes added
Actions

Also available in: Atom PDF