Project

General

Profile

Bug #12821

mds_thrasher: handle MDSes failing on startup

Added by Greg Farnum almost 4 years ago. Updated almost 4 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
Testing
Target version:
-
Start date:
08/28/2015
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:

Description

http://pulpito.ceph.com/teuthology-2015-08-21_23:04:01-fs-master---basic-multi/1026045/

2015-08-23T00:37:44.265 INFO:tasks.mds_thrash.mds_thrasher.failure_group.[a, b-s-a]:waiting till mds map indicates mds.b-s-a is laggy/crashed, in failed state, or mds.b-s-a is removed from mdsmap
2015-08-23T00:37:45.255 INFO:tasks.ceph.mds.a.plana36.stderr:2015-08-23 03:37:45.243502 7f44a3fb2700 -1 mds.0.journaler(ro) try_read_entry: decode error from _is_readable
2015-08-23T00:37:45.255 INFO:tasks.ceph.mds.a.plana36.stderr:2015-08-23 03:37:45.244847 7f44a3fb2700 -1 log_channel(cluster) log [ERR] : Error loading MDS rank 0: (22) Invalid argument
2015-08-23T00:37:45.296 INFO:tasks.ceph.mds.a.plana36.stdout:starting mds.a at :/0

Looks like it never realized that this MDS wasn't coming back, and the whole job hung as a result.


Related issues

Duplicates fs - Feature #10369: qa-suite: detect unexpected MDS failovers and daemon crashes Resolved

History

#1 Updated by John Spray almost 4 years ago

This is kind of a special case of http://tracker.ceph.com/issues/10369 -- there is a more general need for something to notice and report crashes that happen in the background while tests are running.

#2 Updated by Greg Farnum almost 4 years ago

  • Status changed from New to Duplicate

#3 Updated by Patrick Donnelly 2 months ago

  • Duplicates Feature #10369: qa-suite: detect unexpected MDS failovers and daemon crashes added

Also available in: Atom PDF