Bug #11314
closed
This was mentioned in #10369:
Once we have that checker method, we should also introduce a check that all the daemons are really running, so that if e.g. an OSD crashes unexpectedly, we detect it immediately rather than waiting a long time for at timeout of some kind to end the test.
- Subject changed from teuthology: MDS crashed and the runs hung without ever timing out to qa: MDS crashed and the runs hung without ever timing out
- Priority changed from High to Normal
We clearly aren't treating this as very important, and I think we've had more trouble with OSDs doing this than MDSes lately, so whatever.
- Assignee set to Patrick Donnelly
I added a DaemonWatchdog in the mds_thrash.py code that catches this kind of thing. We should pull it out into its own context so it can be used elsewhere.
- Status changed from New to In Progress
- Target version set to v14.0.0
- Tags set to qa
- Backport set to luminous,mimic
- Component(FS) qa-suite added
- Target version changed from v14.0.0 to v15.0.0
- Status changed from In Progress to New
- Assignee changed from Patrick Donnelly to Jos Collin
- Priority changed from Normal to High
- Target version deleted (
v15.0.0)
- Start date deleted (
04/02/2015)
- Tags deleted (
qa)
- Backport deleted (
luminous,mimic)
- Labels (FS) qa added
- Status changed from New to In Progress
- Pull request ID set to 28378
- Status changed from In Progress to Duplicate
- Assignee deleted (
Jos Collin)
- Related to deleted (Feature #10369: qa-suite: detect unexpected MDS failovers and daemon crashes)
- Is duplicate of Feature #10369: qa-suite: detect unexpected MDS failovers and daemon crashes added
Also available in: Atom
PDF