Project

General

Profile

Actions

Bug #53288

open

Failed jobs hanging for 12 hours

Added by David Galloway over 2 years ago. Updated over 2 years ago.

Status:
In Progress
Priority:
Normal
Category:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

e.g., http://qa-proxy.ceph.com/teuthology/yuriw-2021-11-13_15:31:06-rados-wip-yuriw-master-11.12.21-distro-basic-smithi/6501803/teuthology.log

2021-11-14T07:49:49.697 INFO:teuthology.orchestra.run.smithi137.stderr:mount error 110 = Connection timed out
2021-11-14T07:49:49.698 INFO:teuthology.orchestra.run.smithi137.stdout:parsing options: rw,norequire_active_mds,name=1,conf=/etc/ceph/ceph.conf,norbytes
2021-11-14T07:49:49.698 INFO:teuthology.orchestra.run.smithi137.stdout:mount.ceph: options "norequire_active_mds,name=1,norbytes" will pass to kernel.
2021-11-14T07:49:49.701 DEBUG:teuthology.orchestra.run:got remote process result: 32
2021-11-14T07:49:49.701 INFO:tasks.cephfs.kernel_mount:mount command failed
2021-11-14T07:49:49.702 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d4737010a85099043cf081dc05b4069d301b23fb/teuthology/run_tasks.py", line 94, in run_tasks
    manager.__enter__()
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_bed1599fa788bf76a9a9c97632799d018a249f4e/qa/tasks/kclient.py", line 112, in task
    kernel_mount.mount(mntopts=client_config.get('mntopts', []))
  File "/home/teuthworker/src/github.com_ceph_ceph-c_bed1599fa788bf76a9a9c97632799d018a249f4e/qa/tasks/cephfs/kernel_mount.py", line 49, in mount
    retval = self._run_mount_cmd(mntopts, check_status)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_bed1599fa788bf76a9a9c97632799d018a249f4e/qa/tasks/cephfs/kernel_mount.py", line 70, in _run_mount_cmd
    stderr=mountcmd_stderr, omit_sudo=False)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d4737010a85099043cf081dc05b4069d301b23fb/teuthology/orchestra/remote.py", line 509, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d4737010a85099043cf081dc05b4069d301b23fb/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d4737010a85099043cf081dc05b4069d301b23fb/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d4737010a85099043cf081dc05b4069d301b23fb/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi137 with status 32: 'sudo nsenter --net=/var/run/netns/ceph-ns--home-ubuntu-cephtest-mnt.1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage /bin/mount -t ceph :/ /home/ubuntu/cephtest/mnt.1 -v -o norequire_active_mds,name=1,conf=/etc/ceph/ceph.conf,norbytes'
2021-11-14T07:49:49.833 ERROR:teuthology.run_tasks: Sentry event: https://sentry.ceph.com/organizations/ceph/?query=e6eebe4b2f4a4c80bdd9a238b8404bb6
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d4737010a85099043cf081dc05b4069d301b23fb/teuthology/run_tasks.py", line 94, in run_tasks
    manager.__enter__()
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_bed1599fa788bf76a9a9c97632799d018a249f4e/qa/tasks/kclient.py", line 112, in task
    kernel_mount.mount(mntopts=client_config.get('mntopts', []))
  File "/home/teuthworker/src/github.com_ceph_ceph-c_bed1599fa788bf76a9a9c97632799d018a249f4e/qa/tasks/cephfs/kernel_mount.py", line 49, in mount
    retval = self._run_mount_cmd(mntopts, check_status)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_bed1599fa788bf76a9a9c97632799d018a249f4e/qa/tasks/cephfs/kernel_mount.py", line 70, in _run_mount_cmd
    stderr=mountcmd_stderr, omit_sudo=False)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d4737010a85099043cf081dc05b4069d301b23fb/teuthology/orchestra/remote.py", line 509, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d4737010a85099043cf081dc05b4069d301b23fb/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d4737010a85099043cf081dc05b4069d301b23fb/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_d4737010a85099043cf081dc05b4069d301b23fb/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi137 with status 32: 'sudo nsenter --net=/var/run/netns/ceph-ns--home-ubuntu-cephtest-mnt.1 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage /bin/mount -t ceph :/ /home/ubuntu/cephtest/mnt.1 -v -o norequire_active_mds,name=1,conf=/etc/ceph/ceph.conf,norbytes'
2021-11-14T07:49:49.836 DEBUG:teuthology.run_tasks:Unwinding manager kclient
2021-11-14T07:49:49.849 DEBUG:teuthology.run_tasks:Unwinding manager cephadm
2021-11-14T07:49:49.877 INFO:tasks.cephadm:Teardown begin

Later...

2021-11-14T07:50:19.435 INFO:teuthology.task.internal:Tidying up after the test...
2021-11-14T07:50:19.435 DEBUG:teuthology.orchestra.run.smithi073:> find /home/ubuntu/cephtest -ls ; rmdir -- /home/ubuntu/cephtest
2021-11-14T07:50:19.438 DEBUG:teuthology.orchestra.run.smithi137:> find /home/ubuntu/cephtest -ls ; rmdir -- /home/ubuntu/cephtest
2021-11-14T07:50:19.453 INFO:teuthology.orchestra.run.smithi137.stdout:   262155      4 drwxr-xr-x   3  ubuntu   ubuntu       4096 Nov 14 07:50 /home/ubuntu/cephtest
2021-11-14T07:50:19.454 INFO:teuthology.orchestra.run.smithi137.stdout:   397627      4 d---------   2  ubuntu   ubuntu       4096 Nov 14 07:48 /home/ubuntu/cephtest/mnt.1
2021-11-14T07:50:19.455 INFO:teuthology.orchestra.run.smithi137.stderr:find: ‘/home/ubuntu/cephtest/mnt.1’: Permission denied
2021-11-14T07:50:19.455 INFO:teuthology.orchestra.run.smithi137.stderr:rmdir: failed to remove '/home/ubuntu/cephtest': Directory not empty
2021-11-14T19:30:07.385 DEBUG:teuthology.exit:Got signal 15; running 1 handler...
2021-11-14T19:30:07.444 DEBUG:teuthology.task.console_log:Killing console logger for smithi073
2021-11-14T19:30:07.448 DEBUG:teuthology.task.console_log:Killing console logger for smithi137
2021-11-14T19:30:07.449 DEBUG:teuthology.exit:Finished running handlers

Does not seem like a supervisor issue

2021-11-14T07:29:13.039 INFO:teuthology.dispatcher.supervisor:Job archive: /home/teuthworker/archive/yuriw-2021-11-13_15:31:06-rados-wip-yuriw-master-11.12.21-distro-basic-smithi/6501803
2021-11-14T07:29:13.040 INFO:teuthology.dispatcher.supervisor:Job PID: 20051
2021-11-14T07:29:13.040 INFO:teuthology.dispatcher.supervisor:Running with watchdog
2021-11-14T19:30:07.168 WARNING:teuthology.dispatcher.supervisor:Job ran longer than 43200s. Killing...

Related issues 1 (0 open1 closed)

Related to CephFS - Bug #53293: qa: v16.2.4 mds crash caused by centos stream kernelResolvedPatrick Donnelly

Actions
Actions #1

Updated by Patrick Donnelly over 2 years ago

  • Related to Bug #53293: qa: v16.2.4 mds crash caused by centos stream kernel added
Actions #2

Updated by Zack Cerza over 2 years ago

I think we need to have the kclient task reboot all nodes during teardown if a job failure is detected.

Actions #3

Updated by Aishwarya Mathuria over 2 years ago

  • Assignee set to Aishwarya Mathuria
Actions #4

Updated by Aishwarya Mathuria over 2 years ago

  • Status changed from New to In Progress
Actions

Also available in: Atom PDF