Project

General

Profile

Bug #43336

qa: test_unmount_for_evicted_client hangs

Added by Patrick Donnelly 9 months ago. Updated 8 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
kceph, qa-suite
Labels (FS):
Pull request ID:
Crash signature:

Description

2019-12-15T05:14:47.548 INFO:teuthology.orchestra.run.smithi071:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 900 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-mds.c.asok session evict 12880
2019-12-15T05:14:47.920 INFO:tasks.cephfs.filesystem:_json_asok output:
2019-12-15T05:14:47.920 DEBUG:tasks.cephfs.kernel_mount:Unmounting client client.0...
2019-12-15T05:14:47.920 INFO:teuthology.orchestra.run:Running command with timeout 900
2019-12-15T05:14:47.920 INFO:teuthology.orchestra.run.smithi158:> sudo umount /home/ubuntu/cephtest/mnt.0
...
2019-12-15T05:29:25.195 INFO:teuthology.orchestra.run.smithi193:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2019-12-15T05:29:47.926 ERROR:teuthology:Uncaught exception (Hub)
Traceback (most recent call last):
  File "src/gevent/greenlet.py", line 766, in gevent._greenlet.Greenlet.run
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 303, in copy_file_to
    copy_to_log(src, logger, capture=stream)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 272, in copy_to_log
    for line in f:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 108, in next
    line = self.readline()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 291, in readline
    new_data = self._read(n)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 1376, in _read
    return self.channel.recv_stderr(size)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 749, in recv_stderr
    raise socket.timeout()
timeout
2019-12-15T05:29:47.928 ERROR:teuthology:Uncaught exception (Hub)
Traceback (most recent call last):
  File "src/gevent/greenlet.py", line 766, in gevent._greenlet.Greenlet.run
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 303, in copy_file_to
    copy_to_log(src, logger, capture=stream)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 272, in copy_to_log
    for line in f:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 108, in next
    line = self.readline()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/file.py", line 291, in readline
    new_data = self._read(n)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 1361, in _read
    return self.channel.recv(size)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/paramiko/channel.py", line 701, in recv
    raise socket.timeout()
timeout
...
2019-12-15T15:08:30.439 DEBUG:teuthology.exit:Got signal 15; running 2 handlers...
2019-12-15T15:08:30.440 DEBUG:teuthology.task.console_log:Killing console logger for smithi158
2019-12-15T15:08:30.440 INFO:teuthology.orchestra.run:Running command with timeout 900
2019-12-15T15:08:30.440 INFO:teuthology.orchestra.run.smithi158:> sudo PATH=/usr/sbin:$PATH lsof ; ps auxf
2019-12-15T15:08:35.310 DEBUG:teuthology.orchestra.run:got remote process result: None
2019-12-15T15:08:35.371 INFO:tasks.cephfs_test_runner:Test if client hangs on unmount after evicting the client. ... ERROR
2019-12-15T15:08:35.371 INFO:teuthology.orchestra.remote:Trying to reconnect to host
2019-12-15T15:08:35.371 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'smithi184.front.sepia.ceph.com', 'timeout': 60}
2019-12-15T15:08:35.372 DEBUG:teuthology.orchestra.remote:[Errno None] Unable to connect to port 22 on 172.21.15.184
2019-12-15T15:08:35.395 INFO:tasks.cephfs_test_runner:ERROR
2019-12-15T15:08:35.396 INFO:tasks.cephfs_test_runner:
2019-12-15T15:08:35.396 INFO:tasks.cephfs_test_runner:======================================================================
2019-12-15T15:08:35.396 INFO:tasks.cephfs_test_runner:ERROR: test_unmount_for_evicted_client (tasks.cephfs.test_client_recovery.TestClientRecovery)
2019-12-15T15:08:35.396 INFO:tasks.cephfs_test_runner:Test if client hangs on unmount after evicting the client.
2019-12-15T15:08:35.396 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2019-12-15T15:08:35.397 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2019-12-15T15:08:35.397 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20191212.003246/qa/tasks/cephfs/test_client_recovery.py", line 516, in test_unmount_for_evicted_client
2019-12-15T15:08:35.397 INFO:tasks.cephfs_test_runner:    self.mount_a.umount_wait(require_clean=True, timeout=30)
2019-12-15T15:08:35.397 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20191212.003246/qa/tasks/cephfs/kernel_mount.py", line 114, in umount_wait
2019-12-15T15:08:35.397 INFO:tasks.cephfs_test_runner:    self.umount(force)
2019-12-15T15:08:35.397 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20191212.003246/qa/tasks/cephfs/kernel_mount.py", line 89, in umount
2019-12-15T15:08:35.398 INFO:tasks.cephfs_test_runner:    ], timeout=(15*60))
2019-12-15T15:08:35.398 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 198, in run
2019-12-15T15:08:35.398 INFO:tasks.cephfs_test_runner:    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
2019-12-15T15:08:35.398 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 433, in run
2019-12-15T15:08:35.398 INFO:tasks.cephfs_test_runner:    r.wait()
2019-12-15T15:08:35.398 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 158, in wait
2019-12-15T15:08:35.398 INFO:tasks.cephfs_test_runner:    self._raise_for_status()
2019-12-15T15:08:35.399 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 172, in _raise_for_status
2019-12-15T15:08:35.399 INFO:tasks.cephfs_test_runner:    node=self.hostname)
2019-12-15T15:08:35.399 INFO:tasks.cephfs_test_runner:ConnectionLostError: SSH connection to smithi158 was lost: 'sudo PATH=/usr/sbin:$PATH lsof ; ps auxf'
2019-12-15T15:08:35.399 INFO:tasks.cephfs_test_runner:
2019-12-15T15:08:35.400 INFO:tasks.cephfs_test_runner:======================================================================
2019-12-15T15:08:35.401 INFO:tasks.cephfs_test_runner:ERROR: test_unmount_for_evicted_client (tasks.cephfs.test_client_recovery.TestClientRecovery)
2019-12-15T15:08:35.401 INFO:tasks.cephfs_test_runner:Test if client hangs on unmount after evicting the client.
2019-12-15T15:08:35.401 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2019-12-15T15:08:35.401 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2019-12-15T15:08:35.403 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20191212.003246/qa/tasks/cephfs/cephfs_test_case.py", line 174, in tearDown
2019-12-15T15:08:35.403 INFO:tasks.cephfs_test_runner:    super(CephFSTestCase, self).tearDown()
2019-12-15T15:08:35.403 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20191212.003246/qa/tasks/ceph_test_case.py", line 49, in tearDown
2019-12-15T15:08:35.403 INFO:tasks.cephfs_test_runner:    "Ended test {0}".format(self.id()))
2019-12-15T15:08:35.404 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20191212.003246/qa/tasks/ceph_manager.py", line 1278, in raw_cluster_cmd
2019-12-15T15:08:35.404 INFO:tasks.cephfs_test_runner:    stdout=StringIO(),
2019-12-15T15:08:35.404 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 198, in run
2019-12-15T15:08:35.404 INFO:tasks.cephfs_test_runner:    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
2019-12-15T15:08:35.404 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 416, in run
2019-12-15T15:08:35.404 INFO:tasks.cephfs_test_runner:    raise ConnectionLostError(command=quote(args), node=name)
2019-12-15T15:08:35.404 INFO:tasks.cephfs_test_runner:ConnectionLostError: SSH connection to smithi184 was lost: "sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph log 'Ended test tasks.cephfs.test_client_recovery.TestClientRecovery.test_unmount_for_evicted_client'" 
2019-12-15T15:08:35.404 INFO:tasks.cephfs_test_runner:
2019-12-15T15:08:35.405 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2019-12-15T15:08:35.411 INFO:tasks.cephfs_test_runner:Ran 18 tests in 37656.369s

From: /ceph/teuthology-archive/pdonnell-2019-12-15_02:15:20-kcephfs-wip-pdonnell-testing-20191212.003246-distro-basic-smithi/4603741/teuthology.log

See also job 4603720.

"testing" kernel on at least Ubuntu (CentOS runs fail due to different issue #43335) does succeed: /ceph/teuthology-archive/pdonnell-2019-12-15_02:15:20-kcephfs-wip-pdonnell-testing-20191212.003246-distro-basic-smithi/4603752/teuthology.log


Related issues

Related to fs - Bug #41329: mds: reject sessionless messages Resolved
Blocks fs - Backport #41854: mimic: mds: reject sessionless messages Rejected

History

#1 Updated by Zheng Yan 9 months ago

I think it was caused by

[ 150.326253] ceph: mdsc_handle_session corrupt message mds0 len 75^M

#2 Updated by Zheng Yan 9 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 32318

#3 Updated by Patrick Donnelly 9 months ago

  • Related to Bug #41329: mds: reject sessionless messages added

#4 Updated by Patrick Donnelly 9 months ago

#5 Updated by Patrick Donnelly 9 months ago

#6 Updated by Patrick Donnelly 9 months ago

  • Status changed from Fix Under Review to Resolved

#7 Updated by Nathan Cutler 8 months ago

  • Status changed from Resolved to Pending Backport

#8 Updated by Nathan Cutler 8 months ago

  • Status changed from Pending Backport to Resolved

#9 Updated by Nathan Cutler 8 months ago

  • Blocks deleted (Backport #41853: nautilus: mds: reject sessionless messages)

Also available in: Atom PDF