Actions
Bug #45570
openMake "Failed to power on" fatal
Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
2020-05-15T21:29:44.038 DEBUG:teuthology.orchestra.console:pexpect command: ipmitool -H smithi167.ipmi.sepia.ceph.com -I lanplus -U inktank -P ApGNXcA7 power status 2020-05-15T21:29:44.151 WARNING:teuthology.contextutil:'wait for power on' reached maximum tries (5) after waiting for 20.0 seconds 2020-05-15T21:29:44.252 ERROR:teuthology.orchestra.console:Failed to power on smithi167 2020-05-15T21:29:44.353 INFO:teuthology.provision.fog.smithi167:Waiting for deploy to finish ... 2020-05-15T21:34:51.787 DEBUG:teuthology.parallel:result is None 2020-05-15T21:44:34.400 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 91, in run_tasks manager.__enter__() File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__ return next(self.gen) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/internal/lock_machines.py", line 78, in lock_machines os_version, arch) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/lock/ops.py", line 145, in lock_many reimaged[machine] = machines[machine] File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 87, in __exit__ for result in self: File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 101, in __next__ resurrect_traceback(result) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 37, in resurrect_traceback reraise(*exc_info) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/lib/python3.6/site-packages/six.py", line 703, in reraise raise value File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 24, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/provision/__init__.py", line 39, in reimage return obj.create() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/provision/fog.py", line 86, in create self.wait_for_deploy_task(task_id) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/provision/fog.py", line 244, in wait_for_deploy_task while proceed(): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 134, in __call__ raise MaxWhileTries(error_msg) teuthology.exceptions.MaxWhileTries: reached maximum tries (60) after waiting for 900 seconds
No need to wait if powering the system on failed. Make "Failed to power" fatal and have it be the failure cause when the job fails/dies.
Updated by Nathan Cutler over 3 years ago
Just out of curiosity, I took a look. The test is:
description: rados/dashboard/{clusters/{2-node-mgr.yaml} debug/mgr.yaml objectstore/bluestore-bitmap.yaml supported-random-distro$/{ubuntu_latest.yaml} tasks/dashboard.yaml}
and the undesirable behavior that needs to be fixed is the fact that execution of the test continues after this message:
2020-05-15T21:29:44.038 DEBUG:teuthology.orchestra.console:pexpect command: ipmitool -H smithi167.ipmi.sepia.ceph.com -I lanplus -U inktank -P ApGNXcA7 power status 2020-05-15T21:29:44.151 WARNING:teuthology.contextutil:'wait for power on' reached maximum tries (5) after waiting for 20.0 seconds 2020-05-15T21:29:44.252 ERROR:teuthology.orchestra.console:Failed to power on smithi167
and also the job ends up in DEAD status instead of FAIL:
2020-05-15T21:44:34.554 INFO:teuthology.run:DEAD
Updated by Nathan Cutler over 3 years ago
- Status changed from New to Fix Under Review
Actions