Project

General

Profile

Actions

Bug #45570

open

Make "Failed to power on" fatal

Added by David Galloway almost 4 years ago. Updated over 3 years ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

http://qa-proxy.ceph.com/teuthology/aemerson-2020-05-15_18:43:20-rados-wip-objection-triumphant-distro-basic-smithi/5058394/teuthology.log

2020-05-15T21:29:44.038 DEBUG:teuthology.orchestra.console:pexpect command: ipmitool -H smithi167.ipmi.sepia.ceph.com -I lanplus -U inktank -P ApGNXcA7 power status
2020-05-15T21:29:44.151 WARNING:teuthology.contextutil:'wait for power on' reached maximum tries (5) after waiting for 20.0 seconds
2020-05-15T21:29:44.252 ERROR:teuthology.orchestra.console:Failed to power on smithi167
2020-05-15T21:29:44.353 INFO:teuthology.provision.fog.smithi167:Waiting for deploy to finish

...

2020-05-15T21:34:51.787 DEBUG:teuthology.parallel:result is None
2020-05-15T21:44:34.400 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 91, in run_tasks
    manager.__enter__()
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/internal/lock_machines.py", line 78, in lock_machines
    os_version, arch)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/lock/ops.py", line 145, in lock_many
    reimaged[machine] = machines[machine]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 87, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 101, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 37, in resurrect_traceback
    reraise(*exc_info)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 24, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/provision/__init__.py", line 39, in reimage
    return obj.create()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/provision/fog.py", line 86, in create
    self.wait_for_deploy_task(task_id)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/provision/fog.py", line 244, in wait_for_deploy_task
    while proceed():
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 134, in __call__
    raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: reached maximum tries (60) after waiting for 900 seconds

No need to wait if powering the system on failed. Make "Failed to power" fatal and have it be the failure cause when the job fails/dies.

Actions #1

Updated by Nathan Cutler over 3 years ago

Just out of curiosity, I took a look. The test is:

  description: rados/dashboard/{clusters/{2-node-mgr.yaml} debug/mgr.yaml objectstore/bluestore-bitmap.yaml
    supported-random-distro$/{ubuntu_latest.yaml} tasks/dashboard.yaml}

and the undesirable behavior that needs to be fixed is the fact that execution of the test continues after this message:

2020-05-15T21:29:44.038 DEBUG:teuthology.orchestra.console:pexpect command: ipmitool -H smithi167.ipmi.sepia.ceph.com -I lanplus -U inktank -P ApGNXcA7 power status
2020-05-15T21:29:44.151 WARNING:teuthology.contextutil:'wait for power on' reached maximum tries (5) after waiting for 20.0 seconds
2020-05-15T21:29:44.252 ERROR:teuthology.orchestra.console:Failed to power on smithi167

and also the job ends up in DEAD status instead of FAIL:

2020-05-15T21:44:34.554 INFO:teuthology.run:DEAD
Actions #2

Updated by Nathan Cutler over 3 years ago

  • Status changed from New to Fix Under Review
Actions

Also available in: Atom PDF