Bug #14257: test_reconnect_timeout failed - CephFS - Ceph

Actions

Copy link

Bug #14257

closed

test_reconnect_timeout failed

Added by Greg Farnum over 8 years ago. Updated over 8 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

John Spray

Category:

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

http://pulpito.ceph.com/gregf-2015-12-21_23:08:59-fs-master---basic-smithi/1789/

2015-12-22T22:03:27.635 INFO:tasks.cephfs_test_runner:======================================================================
2015-12-22T22:03:27.636 INFO:tasks.cephfs_test_runner:FAIL: test_reconnect_timeout (tasks.cephfs.test_client_recovery.TestClientRecovery)
2015-12-22T22:03:27.636 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2015-12-22T22:03:27.636 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2015-12-22T22:03:27.636 INFO:tasks.cephfs_test_runner:  File "/var/lib/teuthworker/src/ceph-qa-suite_master/tasks/cephfs/test_client_recovery.py", line 160, in test_reconnect_timeout
2015-12-22T22:03:27.636 INFO:tasks.cephfs_test_runner:    self.mds_reconnect_timeout, in_reconnect_for
2015-12-22T22:03:27.636 INFO:tasks.cephfs_test_runner:AssertionError: Should have been in reconnect phase for 45 but only took 14
2015-12-22T22:03:27.637 INFO:tasks.cephfs_test_runner:

Actions

Copy link

Updated by Greg Farnum over 8 years ago

Haven't looked into this at all but I wonder if it's failing to account for all the clients pinging back early, or the control client having been timed out prior to the MDS getting paused, or something.

Actions

Copy link

Updated by Greg Farnum over 8 years ago

Priority changed from Normal to Urgent

Actions

Copy link

Updated by John Spray over 8 years ago

Test bug, Filesystem.wait_for_state is counting elapsed time as the number of times it goes through its polling loop (with a 1s sleep), which is bogus when calls to "mds dump" take a reasonable amount of time (some of these took multiple seconds for some reason). So this is failing on slow clusters. Will update test.

Actions

Copy link