Project

General

Profile

Feature #17309

qa: mon_thrash test for CephFS

Added by John Spray about 2 years ago. Updated 8 months ago.

Status:
New
Priority:
High
Category:
Testing
Target version:
Start date:
09/20/2016
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
qa-suite
Labels (FS):
Pull request ID:

Description

We don't currently have anything that thrashes the mons while running CephFS. It would be useful to run with a thrasher and assert that the MDSs do not get failed incorrectly when mon failures and delays occur.

Currently it seems like there are cases where glitches in mon availability can cause MDSMonitor to trip its timeouts and incorrectly take healthy MDS daemons offline (http://tracker.ceph.com/issues/17308)


Related issues

Related to fs - Bug #17308: MDSMonitor should tolerate paxos delays without failing daemons (Was: Unexplained delay forwarding message between mons) Resolved 09/20/2016
Related to fs - Bug #19706: Laggy mon daemons causing MDS failover (symptom: failed to set counters on mds daemons: set(['mds.dir_split'])) Can't reproduce 04/20/2017

History

#1 Updated by Patrick Donnelly 10 months ago

  • Subject changed from mon_thrash test for CephFS to qa: mon_thrash test for CephFS
  • Assignee set to Patrick Donnelly
  • Target version set to v13.0.0

#2 Updated by Patrick Donnelly 8 months ago

  • Related to Bug #17308: MDSMonitor should tolerate paxos delays without failing daemons (Was: Unexplained delay forwarding message between mons) added

#3 Updated by Patrick Donnelly 8 months ago

  • Related to Bug #19706: Laggy mon daemons causing MDS failover (symptom: failed to set counters on mds daemons: set(['mds.dir_split'])) added

#4 Updated by Patrick Donnelly 8 months ago

  • Priority changed from Normal to High
  • Target version changed from v13.0.0 to v14.0.0

Also available in: Atom PDF