Project

General

Profile

Bug #14257

test_reconnect_timeout failed

Added by Greg Farnum over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
01/06/2016
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:

Description

http://pulpito.ceph.com/gregf-2015-12-21_23:08:59-fs-master---basic-smithi/1789/

2015-12-22T22:03:27.635 INFO:tasks.cephfs_test_runner:======================================================================
2015-12-22T22:03:27.636 INFO:tasks.cephfs_test_runner:FAIL: test_reconnect_timeout (tasks.cephfs.test_client_recovery.TestClientRecovery)
2015-12-22T22:03:27.636 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2015-12-22T22:03:27.636 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2015-12-22T22:03:27.636 INFO:tasks.cephfs_test_runner:  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs/test_client_recovery.py", line 160, in test_reconnect_timeout
2015-12-22T22:03:27.636 INFO:tasks.cephfs_test_runner:    self.mds_reconnect_timeout, in_reconnect_for
2015-12-22T22:03:27.636 INFO:tasks.cephfs_test_runner:AssertionError: Should have been in reconnect phase for 45 but only took 14
2015-12-22T22:03:27.637 INFO:tasks.cephfs_test_runner:

Associated revisions

Revision 58c13c1a (diff)
Added by John Spray over 3 years ago

tasks/cephfs: fix wait_for_state timing

Return actual elapsed wallclock time instead of
the number of times we polled.

Fixes: #14257
Signed-off-by: John Spray <>

History

#1 Updated by Greg Farnum over 3 years ago

Haven't looked into this at all but I wonder if it's failing to account for all the clients pinging back early, or the control client having been timed out prior to the MDS getting paused, or something.

#2 Updated by Greg Farnum over 3 years ago

  • Priority changed from Normal to Urgent

#3 Updated by John Spray over 3 years ago

Test bug, Filesystem.wait_for_state is counting elapsed time as the number of times it goes through its polling loop (with a 1s sleep), which is bogus when calls to "mds dump" take a reasonable amount of time (some of these took multiple seconds for some reason). So this is failing on slow clusters. Will update test.

#4 Updated by John Spray over 3 years ago

  • Status changed from New to In Progress

#5 Updated by John Spray over 3 years ago

  • Status changed from In Progress to Need Review

#6 Updated by John Spray over 3 years ago

  • Status changed from Need Review to Resolved

Fix merged into ceph-qa-suite master.

Also available in: Atom PDF