Project

General

Profile

Actions

Bug #54389

open

Dead jobs caused by "Error reimaging machines: 500 Server Error: Internal Server Error for url: http://fog.front.sepia.ceph.com/fog/host/137/task"

Added by Laura Flores about 2 years ago. Updated about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Description:
rados/cephadm/osds/{0-distro/rhel_8.4_container_tools_rhel8 0-nvme-loop 1-start 2-ops/rm-zap-add}

Failure reason:

Error reimaging machines: 500 Server Error: Internal Server Error for url: http://fog.front.sepia.ceph.com/fog/host/137/task 

/a/yuriw-2022-02-22_16:14:07-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6700749/supervisor.6700749.log
/a/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6698405/supervisor.6698405.log

2022-02-21T17:04:46.566 INFO:root:teuthology version: 1.1.0-eea25212
2022-02-21T17:04:46.705 INFO:teuthology.lock.ops:Start node 'ubuntu@smithi046.front.sepia.ceph.com' reimaging
2022-02-21T17:04:46.706 INFO:teuthology.lock.ops:Updating [ubuntu@smithi046.front.sepia.ceph.com]: reset os type and version on server
2022-02-21T17:04:46.706 INFO:teuthology.lock.ops:Updating smithi046.front.sepia.ceph.com on lock server
2022-02-21T17:04:46.729 INFO:teuthology.lock.ops:Node 'ubuntu@smithi046.front.sepia.ceph.com' reimaging is complete
2022-02-21T17:04:46.729 INFO:teuthology.lock.ops:Start node 'ubuntu@smithi197.front.sepia.ceph.com' reimaging
2022-02-21T17:04:46.730 INFO:teuthology.lock.ops:Updating [ubuntu@smithi197.front.sepia.ceph.com]: reset os type and version on server
2022-02-21T17:04:46.730 INFO:teuthology.lock.ops:Updating smithi197.front.sepia.ceph.com on lock server
2022-02-21T17:04:46.750 INFO:teuthology.lock.ops:Node 'ubuntu@smithi197.front.sepia.ceph.com' reimaging is complete
2022-02-21T17:04:46.968 INFO:teuthology.provision.fog.smithi197:Scheduling deploy of rhel 8.4
2022-02-21T17:04:46.971 INFO:teuthology.provision.fog.smithi046:Scheduling deploy of rhel 8.4
2022-02-21T17:04:47.213 ERROR:teuthology.provision.fog.smithi197:500: { "error": "Host is already a member of an active task" 
}
2022-02-21T17:04:47.214 ERROR:teuthology.dispatcher.supervisor:Reimaging error. Nuking machines...
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/dispatcher/supervisor.py", line 206, in reimage
    reimaged = reimage_machines(ctx, targets, job_config['machine_type'])
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/lock/ops.py", line 315, in reimage_machines
    log.info("Node '%s' reimaging is complete", machine)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/parallel.py", line 84, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/parallel.py", line 98, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/parallel.py", line 30, in resurrect_traceback
    raise exc.exc_info[1]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/parallel.py", line 23, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/provision/__init__.py", line 39, in reimage
    return obj.create()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/provision/fog.py", line 79, in create
    task_id = self.schedule_deploy_task(host_id)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/provision/fog.py", line 199, in schedule_deploy_task
    data='{"taskTypeID": %i}' % deploy_id,
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/provision/fog.py", line 123, in do_request
    resp.raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/virtualenv/lib/python3.6/site-packages/requests/models.py", line 953, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://fog.front.sepia.ceph.com/fog/host/313/task
2022-02-21T17:04:47.279 ERROR:teuthology.provision.fog.smithi046:500: { "error": "Host is already a member of an active task" 
}

There were no other logs collected for this failed job.


Related issues 1 (1 open0 closed)

Related to sepia - Bug #36597: HTTPError: 500 Server Error: Internal Server Error for url: http://fog.front.sepia.ceph.com/fog/host/309/taskNew10/25/2018

Actions
Actions #1

Updated by Laura Flores about 2 years ago

  • Project changed from Orchestrator to sepia

Moving this out of Orchestrator as I think it's more of an infrastructure problem.

Actions #2

Updated by Laura Flores about 2 years ago

  • Related to Bug #36597: HTTPError: 500 Server Error: Internal Server Error for url: http://fog.front.sepia.ceph.com/fog/host/309/task added
Actions #3

Updated by Laura Flores about 2 years ago

Related bug has the same error message, but in that case, teuthology logs were collected. None were collected for the jobs listed in this Tracker.

Actions #5

Updated by David Galloway about 2 years ago

It does look like smithi046 got double locked.

From https://pulpito.ceph.com/nodes/smithi046.front.sepia.ceph.com/?page=4

https://pulpito.ceph.com/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6698404
https://pulpito.ceph.com/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6698405

From http://qa-proxy.ceph.com/teuthology/yuriw-2022-02-21_15:40:41-rados-wip-yuri4-testing-2022-02-18-0800-distro-default-smithi/6698404/supervisor.6698404.log

2022-02-21T17:11:51.001 INFO:teuthology.dispatcher.supervisor:Nuking machines...
2022-02-21T17:11:51.001 INFO:teuthology.nuke:Checking targets against current locks
2022-02-21T17:11:51.097 CRITICAL:teuthology:Uncaught exception
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/virtualenv/bin/teuthology-dispatcher", line 33, in <module>
    sys.exit(load_entry_point('teuthology', 'console_scripts', 'teuthology-dispatcher')())
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/scripts/dispatcher.py", line 33, in main
    teuthology.dispatcher.main(args)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/dispatcher/__init__.py", line 61, in main
    return supervisor.main(args)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/dispatcher/supervisor.py", line 61, in main
    verbose
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/dispatcher/supervisor.py", line 154, in run_job
    unlock_targets(job_config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/dispatcher/supervisor.py", line 252, in unlock_targets
    nuke(fake_ctx, True)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_eea2521245e542c7c1a063d296779f572aa0255a/teuthology/nuke/__init__.py", line 250, in nuke
    if ctx.name not in lock['description']:
TypeError: argument of type 'NoneType' is not iterable
Actions

Also available in: Atom PDF