Bug #44381: kclient: crash/hang during qa/workunits/fs/snaps/snaptest-capwb.sh - CephFS - Ceph

Actions

Copy link

Bug #44381

closed

kclient: crash/hang during qa/workunits/fs/snaps/snaptest-capwb.sh

Added by Patrick Donnelly about 4 years ago. Updated about 4 years ago.

Status:

Closed

Priority:

Urgent

Assignee:

Zheng Yan

Category:

Target version:

Ceph - v15.0.0

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

2020-02-29T09:35:22.472 INFO:tasks.workunit:Running workunit fs/snaps/snaptest-capwb.sh...
2020-02-29T09:35:22.473 INFO:teuthology.orchestra.run.smithi105:workunit test fs/snaps/snaptest-capwb.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=1b30588872aa57834eb528ae5a31abd968ddcfed TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/fs/snaps/snaptest-capwb.sh
2020-02-29T09:35:22.495 INFO:tasks.workunit.client.0.smithi105.stderr:+ set -e
2020-02-29T09:35:22.496 INFO:tasks.workunit.client.0.smithi105.stderr:+ mkdir foo
2020-02-29T09:35:22.501 INFO:tasks.workunit.client.0.smithi105.stderr:+ ceph fs set cephfs allow_new_snaps true --yes-i-really-mean-it
...
2020-02-29T09:35:24.393 INFO:tasks.workunit.client.0.smithi105.stderr:enabled new snapshots
2020-02-29T09:35:52.133 INFO:teuthology.orchestra.run.smithi012:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T09:35:52.136 INFO:teuthology.orchestra.run.smithi105:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T09:35:52.140 INFO:teuthology.orchestra.run.smithi167:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T09:36:22.299 INFO:teuthology.orchestra.run.smithi012:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T09:36:22.302 INFO:teuthology.orchestra.run.smithi105:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T09:36:22.307 INFO:teuthology.orchestra.run.smithi167:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T09:36:52.347 INFO:teuthology.orchestra.run.smithi012:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T09:36:52.349 INFO:teuthology.orchestra.run.smithi105:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T09:36:52.353 INFO:teuthology.orchestra.run.smithi167:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T09:37:22.482 INFO:teuthology.orchestra.run.smithi012:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T09:37:22.485 INFO:teuthology.orchestra.run.smithi105:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T09:52:31.974 ERROR:paramiko.transport:Socket exception: No route to host (113)
2020-02-29T09:52:32.002 DEBUG:teuthology.orchestra.run:got remote process result: None
2020-02-29T09:52:32.002 INFO:tasks.workunit:Stopping ['fs/snaps'] on client.0...
2020-02-29T09:52:32.002 INFO:teuthology.orchestra.remote:Trying to reconnect to host
2020-02-29T09:52:32.003 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'smithi105.front.sepia.ceph.com', 'timeout': 60}
2020-02-29T09:52:32.004 DEBUG:tasks.ceph:Missed logrotate, host unreachable
2020-02-29T09:52:35.078 DEBUG:teuthology.orchestra.remote:[Errno None] Unable to connect to port 22 on 172.21.15.105
2020-02-29T09:52:35.078 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 86, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 65, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20200229.001503/qa/tasks/workunit.py", line 140, in task
    cleanup=cleanup)
  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20200229.001503/qa/tasks/workunit.py", line 290, in _spawn_on_all_clients
    timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 87, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 101, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 37, in resurrect_traceback
    reraise(*exc_info)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 24, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20200229.001503/qa/tasks/workunit.py", line 426, in _run_tests
    args=args,
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 198, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 416, in run
    raise ConnectionLostError(command=quote(args), node=name)
ConnectionLostError: SSH connection to smithi105 was lost: 'sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0'

From: /ceph/teuthology-archive/pdonnell-2020-02-29_02:56:38-kcephfs-wip-pdonnell-testing-20200229.001503-distro-basic-smithi/4811017/teuthology.log

Updated by Patrick Donnelly about 4 years ago

Note: this appears to only happen with the testing kernel. Must be a regression!

Actions

Copy link

Updated by Patrick Donnelly about 4 years ago

Another workunit failed same way: /ceph/teuthology-archive/pdonnell-2020-02-29_02:56:38-kcephfs-wip-pdonnell-testing-20200229.001503-distro-basic-smithi/4811054/teuthology.log

2020-02-29T10:03:24.280 INFO:tasks.workunit.client.0.smithi205.stderr:enabled new snapshots
2020-02-29T10:03:24.288 INFO:tasks.workunit.client.0.smithi205.stderr:+ echo x
2020-02-29T10:03:30.431 INFO:teuthology.orchestra.run.smithi159:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T10:03:30.435 INFO:teuthology.orchestra.run.smithi200:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T10:03:30.441 INFO:teuthology.orchestra.run.smithi205:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-29T21:45:57.858 DEBUG:teuthology.exit:Got signal 15; running 2 handlers...
2020-02-29T21:45:57.877 DEBUG:teuthology.task.console_log:Killing console logger for smithi159
2020-02-29T21:45:57.878 DEBUG:teuthology.task.console_log:Killing console logger for smithi205
2020-02-29T21:45:57.878 DEBUG:teuthology.task.console_log:Killing console logger for smithi200
2020-02-29T21:45:57.878 DEBUG:teuthology.task.console_log:Killing console logger for smithi159
2020-02-29T21:45:57.879 DEBUG:teuthology.task.console_log:Killing console logger for smithi205
2020-02-29T21:45:57.879 DEBUG:teuthology.task.console_log:Killing console logger for smithi200
2020-02-29T21:45:57.879 DEBUG:teuthology.exit:Finished running handlers

Actions

Copy link

Updated by Jeff Layton about 4 years ago

I suspect this is related to the merging of:

[PATCH v3 0/6] ceph: don't request caps for idle open files

I've backed that series out of the testing branch for now, so we can see whether this problem goes away.

Actions

Copy link

Updated by Zheng Yan about 4 years ago

Status changed from New to Closed

it's bug in v3 patches. patches in testing branch are v5, should have fixed the bug

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #44381

kclient: crash/hang during qa/workunits/fs/snaps/snaptest-capwb.sh

Updated by Patrick Donnelly about 4 years ago

Updated by Patrick Donnelly about 4 years ago

Updated by Jeff Layton about 4 years ago

Updated by Zheng Yan about 4 years ago