Bug #46318: mon_recovery: quorum_status times out - RADOS - Ceph

Actions

Copy link

Bug #46318

open

mon_recovery: quorum_status times out

Added by Neha Ojha almost 4 years ago. Updated 3 months ago.

Status:

Need More Info

Priority:

Normal

Assignee:

Sage Weil

Category:

Correctness/Safety

Target version:

% Done:

Source:

Tags:

Backport:

octopus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Monitor

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

2020-06-30T22:11:16.279 INFO:tasks.mon_recovery.ceph_manager:quorum is size 3
2020-06-30T22:11:16.279 INFO:tasks.mon_recovery:stopping all monitors
2020-06-30T22:11:16.280 DEBUG:tasks.ceph.mon.b:waiting for process to exit
2020-06-30T22:11:16.280 INFO:teuthology.orchestra.run:waiting for 300
2020-06-30T22:11:16.290 INFO:tasks.ceph.mon.b:Stopped
2020-06-30T22:11:16.291 DEBUG:tasks.ceph.mon.c:waiting for process to exit
2020-06-30T22:11:16.291 INFO:teuthology.orchestra.run:waiting for 300
2020-06-30T22:11:16.302 INFO:tasks.ceph.mon.c:Stopped
2020-06-30T22:11:16.302 DEBUG:tasks.ceph.mon.a:waiting for process to exit
2020-06-30T22:11:16.303 INFO:teuthology.orchestra.run:waiting for 300
2020-06-30T22:11:16.331 INFO:teuthology.orchestra.run.smithi188.stdout:ERROR: (22) Invalid argument
2020-06-30T22:11:16.331 INFO:teuthology.orchestra.run.smithi188.stdout:op_tracker tracking is not enabled now, so no ops are tracked currently, even those get stuck. Please enable "osd_enable_op_tracker", and the tracker will start to track new ops received afterwards.
2020-06-30T22:11:16.341 INFO:tasks.ceph.mon.a:Stopped
2020-06-30T22:11:16.342 INFO:tasks.mon_recovery:forming a minimal quorum for ['b', 'c', 'a'], then adding monitors
2020-06-30T22:11:16.342 INFO:tasks.ceph.mon.b:Restarting daemon
2020-06-30T22:11:16.342 INFO:teuthology.orchestra.run.smithi074:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f --cluster ceph -i b
2020-06-30T22:11:16.344 INFO:teuthology.orchestra.run.smithi188:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 30 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok dump_ops_in_flight
2020-06-30T22:11:16.346 INFO:tasks.ceph.mon.b:Started
2020-06-30T22:11:16.346 INFO:tasks.ceph.mon.c:Restarting daemon
2020-06-30T22:11:16.347 INFO:teuthology.orchestra.run.smithi188:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f --cluster ceph -i c
2020-06-30T22:11:16.385 INFO:tasks.ceph.osd.3.smithi188.stderr:2020-06-30T22:11:16.383+0000 7fd37fcec700 -1 received  signal: Hangup from /usr/bin/python3 /bin/daemon-helper kill ceph-osd -f --cluster ceph -i 3  (PID: 26486) UID: 0
2020-06-30T22:11:16.386 INFO:tasks.ceph.mon.c:Started
2020-06-30T22:11:16.387 INFO:tasks.mon_recovery.ceph_manager:waiting for quorum size 2
2020-06-30T22:11:16.387 INFO:teuthology.orchestra.run.smithi188:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early quorum_status
.
.
2020-06-30T22:13:16.552 DEBUG:teuthology.orchestra.run:got remote process result: 124
2020-06-30T22:13:16.553 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 90, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 69, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-06-30-1711-octopus/qa/tasks/mon_recovery.py", line 64, in task
    manager.wait_for_mon_quorum_size(num)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-06-30-1711-octopus/qa/tasks/ceph_manager.py", line 2896, in wait_for_mon_quorum_size
    while not len(self.get_mon_quorum()) == size:
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-06-30-1711-octopus/qa/tasks/ceph_manager.py", line 2885, in get_mon_quorum
    out = self.raw_cluster_cmd('quorum_status')
  File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-06-30-1711-octopus/qa/tasks/ceph_manager.py", line 1363, in raw_cluster_cmd
    stdout=BytesIO(),
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 204, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 446, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 160, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 182, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi188 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early quorum_status'

/a/yuriw-2020-06-30_21:52:40-rados-wip-yuri-testing-2020-06-30-1711-octopus-distro-basic-smithi/5191923

Related issues 2 (1 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #46318

mon_recovery: quorum_status times out

Updated by Patrick Donnelly almost 4 years ago

Updated by Joao Eduardo Luis almost 4 years ago

Updated by Joao Eduardo Luis almost 4 years ago

Updated by Deepika Upadhyay over 3 years ago

Updated by Neha Ojha over 3 years ago

Updated by Neha Ojha over 3 years ago

Updated by Kefu Chai over 3 years ago

Updated by Neha Ojha over 3 years ago

Updated by Neha Ojha over 3 years ago

Updated by Sage Weil about 3 years ago

Updated by Sage Weil about 3 years ago

Updated by Sage Weil about 3 years ago

Updated by Sage Weil about 3 years ago

Updated by Neha Ojha over 2 years ago

Updated by Laura Flores 5 months ago

Updated by Nitzan Mordechai 5 months ago

Updated by Radoslaw Zarzynski 5 months ago

Updated by Laura Flores 3 months ago