Bug #48485: osd thrasher timeout - RADOS - Ceph

Actions

Copy link

Bug #48485

open

osd thrasher timeout

Added by Jeff Layton over 3 years ago. Updated over 3 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Tests

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

One of my test runs failed with this:

2020-12-07T16:18:00.235 INFO:teuthology.orchestra.run.gibba018:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-12-07T16:18:00.649 DEBUG:teuthology.orchestra.run:got remote process result: 124
2020-12-07T16:18:00.650 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_jtlayton_ceph_k-stock/qa/tasks/ceph_manager.py", line 115, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_jtlayton_ceph_k-stock/qa/tasks/ceph_manager.py", line 1201, in _do_thrash
    self.choose_action()()
  File "/home/teuthworker/src/github.com_jtlayton_ceph_k-stock/qa/tasks/ceph_manager.py", line 503, in out_osd
    self.ceph_manager.mark_out_osd(osd)
  File "/home/teuthworker/src/github.com_jtlayton_ceph_k-stock/qa/tasks/ceph_manager.py", line 2700, in mark_out_osd
    self.raw_cluster_cmd('osd', 'out', str(osd))
  File "/home/teuthworker/src/github.com_jtlayton_ceph_k-stock/qa/tasks/ceph_manager.py", line 1354, in raw_cluster_cmd
    'stdout': StringIO()}).stdout.getvalue()
  File "/home/teuthworker/src/github.com_jtlayton_ceph_k-stock/qa/tasks/ceph_manager.py", line 1347, in run_cluster_cmd
    return self.controller.run(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 215, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 446, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 160, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 182, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on gibba018 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd out 2'

2020-12-07T16:18:00.651 ERROR:tasks.thrashosds.thrasher:exception:
Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_jtlayton_ceph_k-stock/qa/tasks/ceph_manager.py", line 1069, in do_thrash
    self._do_thrash()
  File "/home/teuthworker/src/github.com_jtlayton_ceph_k-stock/qa/tasks/ceph_manager.py", line 115, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_jtlayton_ceph_k-stock/qa/tasks/ceph_manager.py", line 1201, in _do_thrash
    self.choose_action()()
  File "/home/teuthworker/src/github.com_jtlayton_ceph_k-stock/qa/tasks/ceph_manager.py", line 503, in out_osd
    self.ceph_manager.mark_out_osd(osd)
  File "/home/teuthworker/src/github.com_jtlayton_ceph_k-stock/qa/tasks/ceph_manager.py", line 2700, in mark_out_osd
    self.raw_cluster_cmd('osd', 'out', str(osd))
  File "/home/teuthworker/src/github.com_jtlayton_ceph_k-stock/qa/tasks/ceph_manager.py", line 1354, in raw_cluster_cmd
    'stdout': StringIO()}).stdout.getvalue()
  File "/home/teuthworker/src/github.com_jtlayton_ceph_k-stock/qa/tasks/ceph_manager.py", line 1347, in run_cluster_cmd
    return self.controller.run(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 215, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 446, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 160, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 182, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on gibba018 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd out 2'
2020-12-07T16:18:00.651 INFO:tasks.thrashosds.thrasher:joining the do_sighup greenlet
2020-12-07T16:18:00.652 INFO:tasks.thrashosds.thrasher:joining the do_optrack_toggle greenlet
2020-12-07T16:18:00.652 INFO:tasks.thrashosds.thrasher:joining the do_dump_ops greenlet
2020-12-07T16:18:00.652 INFO:tasks.thrashosds.thrasher:joining the do_noscrub_toggle greenlet
2020-12-07T16:18:00.652 INFO:tasks.ceph.ceph_manager.ceph:waiting for all up

See: https://pulpito.ceph.com/jlayton-2020-12-07_15:46:26-fs-master-wip-fscache-iter-basic-gibba/5689758/

This is testing against recent master (as of a few days ago), with some patches on top of the qa suite to allow for testing with cephfs + fscache.