Bug #42348
closedTestClientRecovery.test_dont_mark_unresponsive_client_stale failure
0%
Description
Saw this in mimic, seems to exist in master too: http://qa-proxy.ceph.com/teuthology/yuriw-2019-10-16_13:28:41-fs-wip-yuri-testing-2019-10-15-1629-mimic-testing-basic-smithi/4416565/teuthology.log
2019-10-16T17:13:50.319 INFO:tasks.cephfs_test_runner:====================================================================== 2019-10-16T17:13:50.319 INFO:tasks.cephfs_test_runner:FAIL: test_dont_mark_unresponsive_client_stale (tasks.cephfs.test_client_recovery.TestClientRecovery) 2019-10-16T17:13:50.319 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2019-10-16T17:13:50.319 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2019-10-16T17:13:50.319 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2019-10-15-1629-mimic/qa/tasks/cephfs/test_client_recovery.py", line 558, in test_dont_mark_unresponsive_client_stale 2019-10-16T17:13:50.320 INFO:tasks.cephfs_test_runner: self.assert_session_count(1, self.fs.mds_asok(['session', 'ls'])) 2019-10-16T17:13:50.320 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2019-10-15-1629-mimic/qa/tasks/cephfs/cephfs_test_case.py", line 213, in assert_session_count 2019-10-16T17:13:50.320 INFO:tasks.cephfs_test_runner: expected, alive_count 2019-10-16T17:13:50.320 INFO:tasks.cephfs_test_runner:AssertionError: Expected 1 sessions, found 2
This comes from this part of the test:
# test that other clients have to wait to get the caps from # unresponsive client until session_autoclose. self.mount_b.run_shell(['stat', 'dir']) self.assert_session_count(1, self.fs.mds_asok(['session', 'ls'])) self.assertLess(time.time(), time_at_beg + SESSION_AUTOCLOSE)
I think just we can just delay fetching session list after auto closing the session since `evict_client()` is invoked as:
if (g_conf->mds_session_blacklist_on_timeout) { std::stringstream ss; mds->evict_client(session->get_client().v, false, true, ss, nullptr); } else { kill_session(session, NULL); }
@Patrick, @Rishabh
Updated by Venky Shankar over 4 years ago
This was not as straightforward as I suggested. However, it's due to PR https://github.com/ceph/ceph/pull/28585 not being included in the list of (mimic) backports to test but the test case for this PR was included. The test case was a separate PR in master: https://github.com/ceph/ceph/pull/22645.
The interesting part is that the test passed once out of 20 odd runs locally with vstart_runner.
Would marking the tracker tickets as dependent lessen the chances of running into this problem in the future?
Updated by Patrick Donnelly over 4 years ago
Venky Shankar wrote:
This was not as straightforward as I suggested. However, it's due to PR https://github.com/ceph/ceph/pull/28585 not being included in the list of (mimic) backports to test but the test case for this PR was included. The test case was a separate PR in master: https://github.com/ceph/ceph/pull/22645.
The interesting part is that the test passed once out of 20 odd runs locally with vstart_runner.
Would marking the tracker tickets as dependent lessen the chances of running into this problem in the future?
I think normally we'd just merge the backports into one PR but I forgot to write a note about that. Sorry!
Updated by Patrick Donnelly over 4 years ago
- Status changed from New to Rejected
Just needs another backport.