Project

General

Profile

Actions

Bug #43885

closed

failed to reach quorum size 9 before timeout expired

Added by Sage Weil over 4 years ago. Updated about 4 years ago.

Status:
Can't reproduce
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This pops up occasionally. Here is a recent one:

2020-01-29T01:28:50.875 INFO:tasks.mon_thrash.ceph_manager:waiting for quorum size 5
2020-01-29T01:28:50.875 INFO:teuthology.orchestra.run.smithi097:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph quorum_status
2020-01-29T01:29:05.877 INFO:tasks.ceph.mon.h.smithi072.stderr:2020-01-29T01:29:05.875+0000 7f7a098d0700 -1 mon.h@4(electing) e1 get_health_metrics reporting 2 slow ops, oldest is log(1 entries from seq 24 at 2020-01-29T01:28:35.447431+0000)
2020-01-29T01:29:06.037 INFO:tasks.ceph.osd.4.smithi072.stderr:2020-01-29T01:29:06.035+0000 7fbbc4a71700 -1 osd.4 167 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.44653.0:1 164.6s0 164.7fc1f406 (undecoded) ondisk+write+known_if_redirected e167)
...
2020-01-29T01:30:50.913 DEBUG:teuthology.orchestra.run:got remote process result: 124
2020-01-29T01:30:50.913 ERROR:tasks.mon_thrash.mon_thrasher:exception:
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-28-1413/qa/tasks/mon_thrash.py", line 232, in do_thrash
    self._do_thrash()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-28-1413/qa/tasks/mon_thrash.py", line 299, in _do_thrash
    self.manager.wait_for_mon_quorum_size(len(mons)-len(mons_to_kill))
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-28-1413/qa/tasks/ceph_manager.py", line 2861, in wait_for_mon_quorum_size
    while not len(self.get_mon_quorum()) == size:
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-28-1413/qa/tasks/ceph_manager.py", line 2850, in get_mon_quorum
    out = self.raw_cluster_cmd('quorum_status')
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-sage-testing-2020-01-28-1413/qa/tasks/ceph_manager.py", line 1353, in raw_cluster_cmd
    stdout=StringIO(),
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 198, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 433, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 158, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 180, in _raise_for_status
    node=self.hostname, label=self.label
CommandFailedError: Command failed on smithi097 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph quorum_status'

/a/sage-2020-01-28_23:42:06-rados-wip-sage-testing-2020-01-28-1413-distro-basic-smithi/4715297
description: rados/monthrash/{ceph.yaml clusters/9-mons.yaml msgr-failures/mon-delay.yaml
msgr/async-v2only.yaml objectstore/bluestore-avl.yaml rados.yaml supported-random-distro$/{rhel_8.yaml}
thrashers/one.yaml workloads/rados_api_tests.yaml}

Actions

Also available in: Atom PDF