Bug #46318
openmon_recovery: quorum_status times out
0%
Description
2020-06-30T22:11:16.279 INFO:tasks.mon_recovery.ceph_manager:quorum is size 3 2020-06-30T22:11:16.279 INFO:tasks.mon_recovery:stopping all monitors 2020-06-30T22:11:16.280 DEBUG:tasks.ceph.mon.b:waiting for process to exit 2020-06-30T22:11:16.280 INFO:teuthology.orchestra.run:waiting for 300 2020-06-30T22:11:16.290 INFO:tasks.ceph.mon.b:Stopped 2020-06-30T22:11:16.291 DEBUG:tasks.ceph.mon.c:waiting for process to exit 2020-06-30T22:11:16.291 INFO:teuthology.orchestra.run:waiting for 300 2020-06-30T22:11:16.302 INFO:tasks.ceph.mon.c:Stopped 2020-06-30T22:11:16.302 DEBUG:tasks.ceph.mon.a:waiting for process to exit 2020-06-30T22:11:16.303 INFO:teuthology.orchestra.run:waiting for 300 2020-06-30T22:11:16.331 INFO:teuthology.orchestra.run.smithi188.stdout:ERROR: (22) Invalid argument 2020-06-30T22:11:16.331 INFO:teuthology.orchestra.run.smithi188.stdout:op_tracker tracking is not enabled now, so no ops are tracked currently, even those get stuck. Please enable "osd_enable_op_tracker", and the tracker will start to track new ops received afterwards. 2020-06-30T22:11:16.341 INFO:tasks.ceph.mon.a:Stopped 2020-06-30T22:11:16.342 INFO:tasks.mon_recovery:forming a minimal quorum for ['b', 'c', 'a'], then adding monitors 2020-06-30T22:11:16.342 INFO:tasks.ceph.mon.b:Restarting daemon 2020-06-30T22:11:16.342 INFO:teuthology.orchestra.run.smithi074:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f --cluster ceph -i b 2020-06-30T22:11:16.344 INFO:teuthology.orchestra.run.smithi188:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 30 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok dump_ops_in_flight 2020-06-30T22:11:16.346 INFO:tasks.ceph.mon.b:Started 2020-06-30T22:11:16.346 INFO:tasks.ceph.mon.c:Restarting daemon 2020-06-30T22:11:16.347 INFO:teuthology.orchestra.run.smithi188:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f --cluster ceph -i c 2020-06-30T22:11:16.385 INFO:tasks.ceph.osd.3.smithi188.stderr:2020-06-30T22:11:16.383+0000 7fd37fcec700 -1 received signal: Hangup from /usr/bin/python3 /bin/daemon-helper kill ceph-osd -f --cluster ceph -i 3 (PID: 26486) UID: 0 2020-06-30T22:11:16.386 INFO:tasks.ceph.mon.c:Started 2020-06-30T22:11:16.387 INFO:tasks.mon_recovery.ceph_manager:waiting for quorum size 2 2020-06-30T22:11:16.387 INFO:teuthology.orchestra.run.smithi188:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early quorum_status . . 2020-06-30T22:13:16.552 DEBUG:teuthology.orchestra.run:got remote process result: 124 2020-06-30T22:13:16.553 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 90, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 69, in run_one_task return task(**kwargs) File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-06-30-1711-octopus/qa/tasks/mon_recovery.py", line 64, in task manager.wait_for_mon_quorum_size(num) File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-06-30-1711-octopus/qa/tasks/ceph_manager.py", line 2896, in wait_for_mon_quorum_size while not len(self.get_mon_quorum()) == size: File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-06-30-1711-octopus/qa/tasks/ceph_manager.py", line 2885, in get_mon_quorum out = self.raw_cluster_cmd('quorum_status') File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-06-30-1711-octopus/qa/tasks/ceph_manager.py", line 1363, in raw_cluster_cmd stdout=BytesIO(), File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 204, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 446, in run r.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 160, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 182, in _raise_for_status node=self.hostname, label=self.label teuthology.exceptions.CommandFailedError: Command failed on smithi188 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early quorum_status'
/a/yuriw-2020-06-30_21:52:40-rados-wip-yuri-testing-2020-06-30-1711-octopus-distro-basic-smithi/5191923
Updated by Patrick Donnelly almost 4 years ago
- Related to Bug #43902: qa: mon_thrash: timeout "ceph quorum_status" added
Updated by Joao Eduardo Luis over 3 years ago
- Category set to Correctness/Safety
- Status changed from New to Triaged
- Assignee set to Joao Eduardo Luis
Updated by Joao Eduardo Luis over 3 years ago
- Priority changed from Normal to Urgent
Updated by Deepika Upadhyay over 3 years ago
/a/yuriw-2020-08-26_18:16:40-rados-wip-yuri-testing-2020-08-26-1631-octopus-distro-basic-smithi/5378436
2020-08-27T01:12:07.707 INFO:tasks.mon_recovery.ceph_manager:quorum is size 2 2020-08-27T01:12:07.707 INFO:tasks.mon_recovery:causing some monitor log activity 2020-08-27T01:12:07.707 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '1 of 30' 2020-08-27T01:12:08.485 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '2 of 30' 2020-08-27T01:12:09.489 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '3 of 30' 2020-08-27T01:12:10.495 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '4 of 30' 2020-08-27T01:12:11.495 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '5 of 30' 2020-08-27T01:12:12.499 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '6 of 30' 2020-08-27T01:12:13.214 INFO:teuthology.orchestra.run.smithi125.stderr:2020-08-27T01:12:13.213+0000 7f637b7fe700 0 --2- 172.21.15.125:0/1708960995 >> v2:172.21.15.43:3300/0 conn(0x7f637c10b260 0x7f637c10b640 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=0 rev1=1 rx=0 tx=0).send_auth_request get_initial_auth_request returned -2 2020-08-27T01:12:13.499 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '7 of 30' 2020-08-27T01:12:14.505 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '8 of 30' 2020-08-27T01:12:15.507 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '9 of 30' 2020-08-27T01:12:16.508 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '10 of 30' 2020-08-27T01:12:17.511 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '11 of 30' 2020-08-27T01:12:21.664 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '12 of 30' 2020-08-27T01:12:22.668 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '13 of 30' 2020-08-27T01:12:23.667 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '14 of 30' 2020-08-27T01:12:24.671 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '15 of 30' 2020-08-27T01:12:25.672 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '16 of 30' 2020-08-27T01:12:26.674 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '17 of 30' 2020-08-27T01:12:27.677 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '18 of 30' 2020-08-27T01:12:28.679 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '19 of 30' 2020-08-27T01:12:29.681 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '20 of 30' 2020-08-27T01:12:30.684 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '21 of 30' 2020-08-27T01:12:31.687 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '22 of 30' 2020-08-27T01:12:32.689 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '23 of 30' 2020-08-27T01:12:33.691 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '24 of 30' 2020-08-27T01:12:34.693 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '25 of 30' 2020-08-27T01:12:35.696 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '26 of 30' 2020-08-27T01:12:36.703 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '27 of 30' 2020-08-27T01:12:37.704 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '28 of 30' 2020-08-27T01:12:38.711 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log '29 of 30' 2020-08-27T01:12:39.707 INFO:tasks.mon_recovery:adding mon c back in 2020-08-27T01:12:39.708 INFO:tasks.ceph.mon.c:Restarting daemon 2020-08-27T01:12:39.708 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f --cluster ceph -i c 2020-08-27T01:12:39.711 INFO:tasks.ceph.mon.c:Started 2020-08-27T01:12:39.712 INFO:tasks.mon_recovery.ceph_manager:waiting for quorum size 3 2020-08-27T01:12:39.712 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early quorum_status 2020-08-27T01:12:40.033 INFO:teuthology.orchestra.run.smithi125.stdout:{"election_epoch":66,"quorum":[0,1],"quorum_names":["b","a"],"quorum_leader_name":"b","quorum_age":32,"features":{"quorum_con":"4540138292836696063","quorum_mon":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus"]},"monmap":{"epoch":1,"fsid":"af484dec-9dda-4a14-8664-c564084077c7","modified":"2020-08-27T01:09:05.934223Z","created":"2020-08-27T01:09:05.934223Z","min_mon_release":15,"min_mon_release_name":"octopus","features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus"],"optional":[]},"mons":[{"rank":0,"name":"b","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.43:3300","nonce":0},{"type":"v1","addr":"172.21.15.43:6789","nonce":0}]},"addr":"172.21.15.43:6789/0","public_addr":"172.21.15.43:6789/0","priority":0,"weight":0},{"rank":1,"name":"a","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.125:3300","nonce":0},{"type":"v1","addr":"172.21.15.125:6789","nonce":0}]},"addr":"172.21.15.125:6789/0","public_addr":"172.21.15.125:6789/0","priority":0,"weight":0},{"rank":2,"name":"c","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.125:3301","nonce":0},{"type":"v1","addr":"172.21.15.125:6790","nonce":0}]},"addr":"172.21.15.125:6790/0","public_addr":"172.21.15.125:6790/0","priority":0,"weight":0}]}} 2020-08-27T01:12:40.045 INFO:tasks.mon_recovery.ceph_manager:quorum_status is {"election_epoch":66,"quorum":[0,1],"quorum_names":["b","a"],"quorum_leader_name":"b","quorum_age":32,"features":{"quorum_con":"4540138292836696063","quorum_mon":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus"]},"monmap":{"epoch":1,"fsid":"af484dec-9dda-4a14-8664-c564084077c7","modified":"2020-08-27T01:09:05.934223Z","created":"2020-08-27T01:09:05.934223Z","min_mon_release":15,"min_mon_release_name":"octopus","features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus"],"optional":[]},"mons":[{"rank":0,"name":"b","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.43:3300","nonce":0},{"type":"v1","addr":"172.21.15.43:6789","nonce":0}]},"addr":"172.21.15.43:6789/0","public_addr":"172.21.15.43:6789/0","priority":0,"weight":0},{"rank":1,"name":"a","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.125:3300","nonce":0},{"type":"v1","addr":"172.21.15.125:6789","nonce":0}]},"addr":"172.21.15.125:6789/0","public_addr":"172.21.15.125:6789/0","priority":0,"weight":0},{"rank":2,"name":"c","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.125:3301","nonce":0},{"type":"v1","addr":"172.21.15.125:6790","nonce":0}]},"addr":"172.21.15.125:6790/0","public_addr":"172.21.15.125:6790/0","priority":0,"weight":0}]}} 2020-08-27T01:12:43.053 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early quorum_status 2020-08-27T01:12:48.101 INFO:teuthology.orchestra.run.smithi125.stdout:{"election_epoch":70,"quorum":[0,1],"quorum_names":["b","a"],"quorum_leader_name":"b","quorum_age":0,"features":{"quorum_con":"4540138292836696063","quorum_mon":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus"]},"monmap":{"epoch":1,"fsid":"af484dec-9dda-4a14-8664-c564084077c7","modified":"2020-08-27T01:09:05.934223Z","created":"2020-08-27T01:09:05.934223Z","min_mon_release":15,"min_mon_release_name":"octopus","features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus"],"optional":[]},"mons":[{"rank":0,"name":"b","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.43:3300","nonce":0},{"type":"v1","addr":"172.21.15.43:6789","nonce":0}]},"addr":"172.21.15.43:6789/0","public_addr":"172.21.15.43:6789/0","priority":0,"weight":0},{"rank":1,"name":"a","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.125:3300","nonce":0},{"type":"v1","addr":"172.21.15.125:6789","nonce":0}]},"addr":"172.21.15.125:6789/0","public_addr":"172.21.15.125:6789/0","priority":0,"weight":0},{"rank":2,"name":"c","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.125:3301","nonce":0},{"type":"v1","addr":"172.21.15.125:6790","nonce":0}]},"addr":"172.21.15.125:6790/0","public_addr":"172.21.15.125:6790/0","priority":0,"weight":0}]}} 2020-08-27T01:12:48.112 INFO:tasks.mon_recovery.ceph_manager:quorum_status is {"election_epoch":70,"quorum":[0,1],"quorum_names":["b","a"],"quorum_leader_name":"b","quorum_age":0,"features":{"quorum_con":"4540138292836696063","quorum_mon":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus"]},"monmap":{"epoch":1,"fsid":"af484dec-9dda-4a14-8664-c564084077c7","modified":"2020-08-27T01:09:05.934223Z","created":"2020-08-27T01:09:05.934223Z","min_mon_release":15,"min_mon_release_name":"octopus","features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus"],"optional":[]},"mons":[{"rank":0,"name":"b","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.43:3300","nonce":0},{"type":"v1","addr":"172.21.15.43:6789","nonce":0}]},"addr":"172.21.15.43:6789/0","public_addr":"172.21.15.43:6789/0","priority":0,"weight":0},{"rank":1,"name":"a","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.125:3300","nonce":0},{"type":"v1","addr":"172.21.15.125:6789","nonce":0}]},"addr":"172.21.15.125:6789/0","public_addr":"172.21.15.125:6789/0","priority":0,"weight":0},{"rank":2,"name":"c","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.125:3301","nonce":0},{"type":"v1","addr":"172.21.15.125:6790","nonce":0}]},"addr":"172.21.15.125:6790/0","public_addr":"172.21.15.125:6790/0","priority":0,"weight":0}]}} 2020-08-27T01:12:51.114 INFO:teuthology.orchestra.run.smithi125:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early quorum_status 2020-08-27T01:14:51.158 DEBUG:teuthology.orchestra.run:got remote process result: 124 2020-08-27T01:14:51.161 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 90, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 69, in run_one_task return task(**kwargs) File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-08-26-1631-octopus/qa/tasks/mon_recovery.py", line 80, in task manager.wait_for_mon_quorum_size(len(mons)) File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-08-26-1631-octopus/qa/tasks/ceph_manager.py", line 2896, in wait_for_mon_quorum_size while not len(self.get_mon_quorum()) == size: File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-08-26-1631-octopus/qa/tasks/ceph_manager.py", line 2885, in get_mon_quorum out = self.raw_cluster_cmd('quorum_status') File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-yuri-testing-2020-08-26-1631-octopus/qa/tasks/ceph_manager.py", line 1363, in raw_cluster_cmd stdout=BytesIO(), File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 204, in run r = self._runner(client=self.ssh, name=self.shortname, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 446, in run r.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 160, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 182, in _raise_for_status node=self.hostname, label=self.label teuthology.exceptions.CommandFailedError: Command failed on smithi125 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early quorum_status' 2020-08-27T01:14:51.209 ERROR:teuthology.run_tasks: Sentry event: https://sentry.ceph.com/sepia/teuthology/?q=d715fff7af674694aeb0e6b26a0898f3
Updated by Neha Ojha over 3 years ago
rados/monthrash/{ceph clusters/3-mons mon_election/connectivity msgr-failures/few msgr/async-v1only objectstore/bluestore-stupid rados supported-random-distro$/{ubuntu_latest} thrashers/one workloads/rados_mon_workunits}
2020-10-21T03:43:37.094 INFO:tasks.mon_thrash.ceph_manager:waiting for quorum size 2 2020-10-21T03:43:37.094 INFO:teuthology.orchestra.run.smithi114:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph quorum_status 2020-10-21T03:43:41.583 DEBUG:teuthology.orchestra.run:got remote process result: 124
/a/yuriw-2020-10-20_21:33:24-rados-wip-yuri8-testing-master-distro-basic-smithi/5543217
Updated by Kefu Chai over 3 years ago
/a/kchai-2020-11-08_14:53:34-rados-wip-kefu-testing-2020-11-07-2116-distro-basic-smithi/5602229/
Updated by Neha Ojha over 3 years ago
/a/teuthology-2020-12-06_07:01:02-rados-master-distro-basic-smithi/5685125
Updated by Neha Ojha over 3 years ago
We are still seeing these.
/a/teuthology-2021-01-18_07:01:01-rados-master-distro-basic-smithi/5798278
Updated by Sage Weil about 3 years ago
- Status changed from Triaged to In Progress
- Assignee set to Sage Weil
Neha Ojha wrote:
We are still seeing these.
/a/teuthology-2021-01-18_07:01:01-rados-master-distro-basic-smithi/5798278
The quorum_status command that timed out never reached the mon. Either the client can't connect, or the mon is rejecting the conenction for some reason. The mom quorum formed fine.
Trying to reproduce with client-side logging.
Updated by Sage Weil about 3 years ago
- Status changed from In Progress to Need More Info
having trouble reproducing (after about 150 jobs). adding increased debugging to master with https://github.com/ceph/ceph/pull/39696
Updated by Sage Weil about 3 years ago
same symptom... cli command fails to contact mon
/a/sage-2021-02-28_18:35:15-rados-wip-sage-testing-2021-02-28-1217-distro-basic-smithi/5921243
2021-02-28T19:40:42.545 DEBUG:teuthology.orchestra.run.smithi014:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd last-stat-seq osd.2 2021-02-28T19:42:42.575 DEBUG:teuthology.orchestra.run:got remote process result: 124 2021-02-28T19:42:42.577 ERROR:teuthology.run_tasks:Manager failed: thrashosds Traceback (most recent call last):
the command message never reaches the mon. no client side debug log
Updated by Neha Ojha over 2 years ago
- Priority changed from Urgent to Normal
Haven't seen this in recent rados runs.
Updated by Laura Flores 5 months ago
/a/yuriw-2023-11-27_22:36:50-rados-wip-yuri-testing-2023-11-27-1028-pacific-distro-default-smithi/7469028
Updated by Nitzan Mordechai 5 months ago
Laura Flores wrote:
/a/yuriw-2023-11-27_22:36:50-rados-wip-yuri-testing-2023-11-27-1028-pacific-distro-default-smithi/7469028
Laura, it looks like it related to tracker https://tracker.ceph.com/issues/43887
the test started at 06:14:55 and killed at 18:16:01 (timeout probably after 12 hours)
ceph_test_rados_delete_pools_parallel didn't complete, I didn't backported it to pacific.
Updated by Radoslaw Zarzynski 5 months ago
- Related to Bug #43887: ceph_test_rados_delete_pools_parallel failure added
Updated by Laura Flores 2 months ago
/a/yuriw-2024-02-01_21:25:50-rados-wip-yuri2-testing-2024-02-01-0939-pacific-distro-default-smithi/7542501