Bug #55806: qa failure: workload dbench failure - CephFS - Ceph

Actions

Copy link

Bug #55806

open

qa failure: workload dbench failure

Added by Rishabh Dave almost 2 years ago. Updated almost 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

qa-suite

Labels (FS):

qa, qa-failure

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Bug discovered on QA run for PR - https://github.com/ceph/ceph/pull/45556

Teuthology job - https://pulpito.ceph.com/vshankar-2022-04-26_06:23:29-fs:workload-wip-45556-20220418-102656-testing-default-smithi/6806486/

Traceback #1 -

    2022-04-26T07:48:39.629 DEBUG:teuthology.orchestra.run.smithi047:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub status
    2022-04-26T07:48:40.063 INFO:tasks.workunit.client.0.smithi047.stdout:  10    152463    28.57 MB/sec  execute 155 sec  latency 260.154 ms
    2022-04-26T07:48:40.089 INFO:journalctl@ceph.mon.a.smithi047.stdout:Apr 26 07:48:39 smithi047 ceph-mon[27550]: Health check cleared: CEPHADM_REFRESH_FAILED (was: failed to probe daemons or devices)
    2022-04-26T07:48:40.089 INFO:journalctl@ceph.mon.a.smithi047.stdout:Apr 26 07:48:39 smithi047 ceph-mon[27550]: Cluster is now healthy
    2022-04-26T07:48:40.220 INFO:journalctl@ceph.mon.c.smithi119.stdout:Apr 26 07:48:39 smithi119 ceph-mon[33953]: Health check cleared: CEPHADM_REFRESH_FAILED (was: failed to probe daemons or devices)
    2022-04-26T07:48:40.221 INFO:journalctl@ceph.mon.c.smithi119.stdout:Apr 26 07:48:39 smithi119 ceph-mon[33953]: Cluster is now healthy
    2022-04-26T07:48:40.301 INFO:journalctl@ceph.mon.b.smithi074.stdout:Apr 26 07:48:39 smithi074 ceph-mon[32986]: Health check cleared: CEPHADM_REFRESH_FAILED (was: failed to probe daemons or devices)
    2022-04-26T07:48:40.301 INFO:journalctl@ceph.mon.b.smithi074.stdout:Apr 26 07:48:39 smithi074 ceph-mon[32986]: Cluster is now healthy
    2022-04-26T07:48:40.695 INFO:teuthology.orchestra.run.smithi047.stderr:Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)
    2022-04-26T07:48:40.712 DEBUG:teuthology.orchestra.run:got remote process result: 1
    2022-04-26T07:48:40.713 ERROR:tasks.fwd_scrub.fs.[cephfs]:exception:
    Traceback (most recent call last):
      File "/home/teuthworker/src/git.ceph.com_ceph-c_1ccbc711b8876e630c0358e1d8d923daa34dca1e/qa/tasks/fwd_scrub.py", line 38, in _run
        self.do_scrub()
      File "/home/teuthworker/src/git.ceph.com_ceph-c_1ccbc711b8876e630c0358e1d8d923daa34dca1e/qa/tasks/fwd_scrub.py", line 55, in do_scrub
        self._scrub()
      File "/home/teuthworker/src/git.ceph.com_ceph-c_1ccbc711b8876e630c0358e1d8d923daa34dca1e/qa/tasks/fwd_scrub.py", line 77, in _scrub
        timeout=self.scrub_timeout)
      File "/home/teuthworker/src/git.ceph.com_ceph-c_1ccbc711b8876e630c0358e1d8d923daa34dca1e/qa/tasks/cephfs/filesystem.py", line 1618, in wait_until_scrub_complete
        out_json = self.rank_tell(["scrub", "status"], rank=rank)
      File "/home/teuthworker/src/git.ceph.com_ceph-c_1ccbc711b8876e630c0358e1d8d923daa34dca1e/qa/tasks/cephfs/filesystem.py", line 1196, in rank_tell
        out = self.mon_manager.raw_cluster_cmd("tell", f"mds.{self.id}:{rank}", *command)
      File "/home/teuthworker/src/git.ceph.com_ceph-c_1ccbc711b8876e630c0358e1d8d923daa34dca1e/qa/tasks/ceph_manager.py", line 1597, in raw_cluster_cmd
        return self.run_cluster_cmd(**kwargs).stdout.getvalue()
      File "/home/teuthworker/src/git.ceph.com_ceph-c_1ccbc711b8876e630c0358e1d8d923daa34dca1e/qa/tasks/ceph_manager.py", line 1588, in run_cluster_cmd
        return self.controller.run(**kwargs)
      File "/home/teuthworker/src/git.ceph.com_git_teuthology_788cfdd8098ad222aa448289edcfa4436091c32c/teuthology/orchestra/remote.py", line 509, in run
        r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
      File "/home/teuthworker/src/git.ceph.com_git_teuthology_788cfdd8098ad222aa448289edcfa4436091c32c/teuthology/orchestra/run.py", line 455, in run
        r.wait()
      File "/home/teuthworker/src/git.ceph.com_git_teuthology_788cfdd8098ad222aa448289edcfa4436091c32c/teuthology/orchestra/run.py", line 161, in wait
        self._raise_for_status()
      File "/home/teuthworker/src/git.ceph.com_git_teuthology_788cfdd8098ad222aa448289edcfa4436091c32c/teuthology/orchestra/run.py", line 183, in _raise_for_status
        node=self.hostname, label=self.label
    teuthology.exceptions.CommandFailedError: Command failed on smithi047 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph tell mds.1:0 scrub status'

Traceback #2 -

    Traceback (most recent call last):
      File "/home/teuthworker/src/git.ceph.com_git_teuthology_788cfdd8098ad222aa448289edcfa4436091c32c/teuthology/run_tasks.py", line 188, in run_tasks
        suppress = manager.__exit__(*exc_info)
      File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
        next(self.gen)
      File "/home/teuthworker/src/git.ceph.com_ceph-c_1ccbc711b8876e630c0358e1d8d923daa34dca1e/qa/tasks/fwd_scrub.py", line 151, in task
        stop_all_fwd_scrubbers(ctx.ceph[config['cluster']].thrashers)
      File "/home/teuthworker/src/git.ceph.com_ceph-c_1ccbc711b8876e630c0358e1d8d923daa34dca1e/qa/tasks/fwd_scrub.py", line 86, in stop_all_fwd_scrubbers
        raise RuntimeError(f"error during scrub thrashing: {thrasher.exception}")

Command failing in traceback #2: ``2022-04-26T07:56:57.401 DEBUG:teuthology.orchestra.run.smithi119:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:1ccbc711b8876e630c0358e1d8d923daa34dca1e shell --fsid 6ac69888-c531-11ec-8c39-001a4aab830c -- ceph daemon mds.l perf dump``