Project

General

Profile

Actions

Bug #43902

open

qa: mon_thrash: timeout "ceph quorum_status"

Added by Patrick Donnelly about 4 years ago. Updated almost 2 years ago.

Status:
Triaged
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
pacific,octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-01-25T23:57:09.282 INFO:teuthology.orchestra.run.smithi035:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph quorum_status
2020-01-25T23:57:33.932 INFO:teuthology.orchestra.run.smithi035:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-25T23:57:33.935 INFO:teuthology.orchestra.run.smithi174:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-25T23:58:03.990 INFO:teuthology.orchestra.run.smithi035:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-25T23:58:03.993 INFO:teuthology.orchestra.run.smithi174:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-25T23:58:34.111 INFO:teuthology.orchestra.run.smithi035:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-25T23:58:34.114 INFO:teuthology.orchestra.run.smithi174:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-25T23:59:04.215 INFO:teuthology.orchestra.run.smithi035:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-25T23:59:04.217 INFO:teuthology.orchestra.run.smithi174:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-01-25T23:59:09.303 DEBUG:teuthology.orchestra.run:got remote process result: 124
2020-01-25T23:59:09.303 ERROR:tasks.mon_thrash.mon_thrasher:exception:
Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20200124.211519/qa/tasks/mon_thrash.py", line 232, in do_thrash
    self._do_thrash()
  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20200124.211519/qa/tasks/mon_thrash.py", line 323, in _do_thrash
    self.manager.wait_for_mon_quorum_size(len(mons))
  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20200124.211519/qa/tasks/ceph_manager.py", line 2850, in wait_for_mon_quorum_size
    while not len(self.get_mon_quorum()) == size:
  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20200124.211519/qa/tasks/ceph_manager.py", line 2839, in get_mon_quorum
    out = self.raw_cluster_cmd('quorum_status')
  File "/home/teuthworker/src/github.com_batrick_ceph_wip-pdonnell-testing-20200124.211519/qa/tasks/ceph_manager.py", line 1342, in raw_cluster_cmd
    stdout=StringIO(),
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 198, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 433, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 158, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 180, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed on smithi035 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph quorum_status'

Happens during snaptests with mon thrasher. Might need to extend the timeout?


Related issues 1 (1 open0 closed)

Related to RADOS - Bug #46318: mon_recovery: quorum_status times outNeed More InfoSage Weil

Actions
Actions #1

Updated by Patrick Donnelly about 4 years ago

  • Assignee set to Ramana Raja
Actions #2

Updated by Patrick Donnelly about 4 years ago

  • Status changed from New to Triaged
  • Target version changed from v15.0.0 to v16.0.0
  • Backport set to octopus
Actions #3

Updated by Patrick Donnelly almost 4 years ago

  • Related to Bug #46318: mon_recovery: quorum_status times out added
Actions #4

Updated by Patrick Donnelly almost 4 years ago

/ceph/teuthology-archive/pdonnell-2020-07-11_02:43:08-fs-wip-pdonnell-testing-20200711.001802-distro-basic-smithi/5214057/teuthology.log

Actions #5

Updated by Patrick Donnelly over 3 years ago

  • Assignee deleted (Ramana Raja)
  • Priority changed from Urgent to High

/ceph/teuthology-archive/pdonnell-2020-09-29_05:23:34-fs-wip-pdonnell-testing-20200929.022151-distro-basic-smithi/5479921/teuthology.log

Actions #6

Updated by Patrick Donnelly over 3 years ago

  • Target version changed from v16.0.0 to v17.0.0
  • Backport changed from octopus to pacific,octopus,nautilus
Actions #7

Updated by Patrick Donnelly almost 2 years ago

  • Target version deleted (v17.0.0)
Actions

Also available in: Atom PDF