Project

General

Profile

Actions

Bug #20318

closed

Race in TestExports.test_export_pin

Added by John Spray almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Category:
Testing
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Seen failure here:
http://pulpito.ceph.com/jspray-2017-06-15_02:50:24-multimds-wip-jcsp-testing-20170614-testing-basic-smithi/1288073

2017-06-15T03:37:05.175 INFO:tasks.cephfs_test_runner:======================================================================
2017-06-15T03:37:05.175 INFO:tasks.cephfs_test_runner:ERROR: test_export_pin (tasks.cephfs.test_exports.TestExports)
2017-06-15T03:37:05.176 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2017-06-15T03:37:05.176 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2017-06-15T03:37:05.176 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-jcsp-testing-20170614/qa/tasks/cephfs/test_exports.py", line 43, in test_export_pin
2017-06-15T03:37:05.176 INFO:tasks.cephfs_test_runner:    self._wait_subtrees(status, 1, [('/1', 1)])
2017-06-15T03:37:05.176 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-jcsp-testing-20170614/qa/tasks/cephfs/test_exports.py", line 16, in _wait_subtrees
2017-06-15T03:37:05.176 INFO:tasks.cephfs_test_runner:    subtrees = self.fs.mds_asok(["get", "subtrees"], mds_id=status.get_rank(self.fs.id, rank)['name'])
2017-06-15T03:37:05.176 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-jcsp-testing-20170614/qa/tasks/cephfs/filesystem.py", line 119, in get_rank
2017-06-15T03:37:05.176 INFO:tasks.cephfs_test_runner:    raise RuntimeError("FSCID {0} has no rank {1}".format(fscid, rank))
2017-06-15T03:37:05.176 INFO:tasks.cephfs_test_runner:RuntimeError: FSCID 2 has no rank 1

It looks like this is because mds.b is coming up and sending its up:boot beacon to the mon promptly, but the mon is taking a while to add it to the map, so when we set max_mds we're not immediately getting the new rank.

The test can either wait for the rank after setting max_mds, or wait for the standby before setting it.

I'm a little worried that the mon slowness is a bug (not a cephfs one) in its own right, but let's make the test non-racy anyway.


Related issues 1 (0 open1 closed)

Has duplicate CephFS - Bug #20328: Test failure: test_export_pin (tasks.cephfs.test_exports.TestExports)Duplicate06/16/2017

Actions
Actions #1

Updated by John Spray almost 7 years ago

  • Has duplicate Bug #20328: Test failure: test_export_pin (tasks.cephfs.test_exports.TestExports) added
Actions #2

Updated by Patrick Donnelly almost 7 years ago

  • Status changed from New to Fix Under Review
Actions #3

Updated by Patrick Donnelly almost 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF