Project

General

Profile

Actions

Bug #46318

open

mon_recovery: quorum_status times out

Added by Neha Ojha almost 4 years ago. Updated 3 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-06-30T22:11:16.279 INFO:tasks.mon_recovery.ceph_manager:quorum is size 3
2020-06-30T22:11:16.279 INFO:tasks.mon_recovery:stopping all monitors
2020-06-30T22:11:16.280 DEBUG:tasks.ceph.mon.b:waiting for process to exit
2020-06-30T22:11:16.280 INFO:teuthology.orchestra.run:waiting for 300
2020-06-30T22:11:16.290 INFO:tasks.ceph.mon.b:Stopped
2020-06-30T22:11:16.291 DEBUG:tasks.ceph.mon.c:waiting for process to exit
2020-06-30T22:11:16.291 INFO:teuthology.orchestra.run:waiting for 300
2020-06-30T22:11:16.302 INFO:tasks.ceph.mon.c:Stopped
2020-06-30T22:11:16.302 DEBUG:tasks.ceph.mon.a:waiting for process to exit
2020-06-30T22:11:16.303 INFO:teuthology.orchestra.run:waiting for 300
2020-06-30T22:11:16.331 INFO:teuthology.orchestra.run.smithi188.stdout:ERROR: (22) Invalid argument
2020-06-30T22:11:16.331 INFO:teuthology.orchestra.run.smithi188.stdout:op_tracker tracking is not enabled now, so no ops are tracked currently, even those get stuck. Please enable "osd_enable_op_tracker", and the tracker will start to track new ops received afterwards.
2020-06-30T22:11:16.341 INFO:tasks.ceph.mon.a:Stopped
2020-06-30T22:11:16.342 INFO:tasks.mon_recovery:forming a minimal quorum for ['b', 'c', 'a'], then adding monitors
2020-06-30T22:11:16.342 INFO:tasks.ceph.mon.b:Restarting daemon
2020-06-30T22:11:16.342 INFO:teuthology.orchestra.run.smithi074:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f --cluster ceph -i b
2020-06-30T22:11:16.344 INFO:teuthology.orchestra.run.smithi188:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 30 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok dump_ops_in_flight
2020-06-30T22:11:16.346 INFO:tasks.ceph.mon.b:Started
2020-06-30T22:11:16.346 INFO:tasks.ceph.mon.c:Restarting daemon
2020-06-30T22:11:16.347 INFO:teuthology.orchestra.run.smithi188:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f --cluster ceph -i c
2020-06-30T22:11:16.385 INFO:tasks.ceph.osd.3.smithi188.stderr:2020-06-30T22:11:16.383+0000 7fd37fcec700 -1 received  signal: Hangup from /usr/bin/python3 /bin/daemon-helper kill ceph-osd -f --cluster ceph -i 3  (PID: 26486) UID: 0
2020-06-30T22:11:16.386 INFO:tasks.ceph.mon.c:Started
2020-06-30T22:11:16.387 INFO:tasks.mon_recovery.ceph_manager:waiting for quorum size 2
2020-06-30T22:11:16.387 INFO:teuthology.orchestra.run.smithi188:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early quorum_status
.
.
2020-06-30T22:13:16.552 DEBUG:teuthology.orchestra.run:got remote process result: 124
2020-06-30T22:13:16.553 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 90, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 69, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-06-30-1711-octopus/qa/tasks/mon_recovery.py", line 64, in task
    manager.wait_for_mon_quorum_size(num)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-06-30-1711-octopus/qa/tasks/ceph_manager.py", line 2896, in wait_for_mon_quorum_size
    while not len(self.get_mon_quorum()) == size:
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-06-30-1711-octopus/qa/tasks/ceph_manager.py", line 2885, in get_mon_quorum
    out = self.raw_cluster_cmd('quorum_status')
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-06-30-1711-octopus/qa/tasks/ceph_manager.py", line 1363, in raw_cluster_cmd
    stdout=BytesIO(),
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 204, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 446, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 160, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 182, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi188 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early quorum_status'

/a/yuriw-2020-06-30_21:52:40-rados-wip-yuri-testing-2020-06-30-1711-octopus-distro-basic-smithi/5191923


Related issues 2 (1 open1 closed)

Related to CephFS - Bug #43902: qa: mon_thrash: timeout "ceph quorum_status"Triaged

Actions
Related to RADOS - Bug #43887: ceph_test_rados_delete_pools_parallel failureResolvedNitzan Mordechai

Actions
Actions

Also available in: Atom PDF