Project

General

Profile

Bug #51944

teuthology.dispatcher.supervisor:Could not save logs

Added by Neha Ojha 3 months ago. Updated 9 days ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

2021-07-28T07:16:53.639 INFO:teuthology.misc:Compressing logs...
2021-07-28T07:16:53.640 INFO:teuthology.orchestra.remote:Trying to reconnect to host
2021-07-28T07:17:28.139 ERROR:teuthology.dispatcher.supervisor:Could not save logs
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_73aa7e3b960c7ffac669297b6aa86606265edd1b/teuthology/dispatcher/supervisor.py", line 231, in run_with_watchdog
    teuth_config.archive_base, job_config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_73aa7e3b960c7ffac669297b6aa86606265edd1b/teuthology/dispatcher/supervisor.py", line 289, in transfer_archives
    compress_logs(ctx, log_path)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_73aa7e3b960c7ffac669297b6aa86606265edd1b/teuthology/misc.py", line 1378, in compress_logs
    wait=False,
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_73aa7e3b960c7ffac669297b6aa86606265edd1b/teuthology/orchestra/cluster.py", line 64, in run
    return [remote.run(**kwargs) for remote in remotes]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_73aa7e3b960c7ffac669297b6aa86606265edd1b/teuthology/orchestra/cluster.py", line 64, in <listcomp>
    return [remote.run(**kwargs) for remote in remotes]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_73aa7e3b960c7ffac669297b6aa86606265edd1b/teuthology/orchestra/remote.py", line 509, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_73aa7e3b960c7ffac669297b6aa86606265edd1b/teuthology/orchestra/run.py", line 434, in run
    transport = client.get_transport()
AttributeError: 'NoneType' object has no attribute 'get_transport'

/a/yuriw-2021-07-27_17:19:39-rados-wip-yuri-testing-2021-07-27-0830-pacific-distro-basic-smithi/6297040$

supervisor.6369338.log View (15.6 KB) David Galloway, 09/10/2021 12:55 PM

History

#1 Updated by Neha Ojha 3 months ago

/a/yuriw-2021-07-28_20:56:33-rados-wip-yuri7-testing-2021-07-28-1155-octopus-distro-basic-smithi/6299474

#2 Updated by Josh Durgin 3 months ago

  • Assignee set to Kamoltat Sirivadhna

#3 Updated by Deepika Upadhyay about 2 months ago

/ceph/teuthology-archive/yuriw-2021-09-02_14:43:38-rbd-wip-yuri7-testing-2021-09-01-0934-octopus-distro-basic-smithi/6372639/supervisor.6372639.log

#4 Updated by David Galloway about 2 months ago

I think there's a much larger issue here.

2021-09-09T17:12:08.799 INFO:teuthology.dispatcher.supervisor:Running job 6369338
2021-09-09T17:12:08.799 DEBUG:teuthology.dispatcher.supervisor:Running: /home/teuthworker/src/git.ceph.com_git_teuthology_df69d6db9d68f5492f47c1dee57bca3b0a385796/virtualenv/bin/teuthology -v --owner scheduled_teuthology@teuthology --archive /home/teuthworker/archive/teuthology-2021-09-01_03:30:03-rados-octopus-distro-basic-gibba/6369338 --name teuthology-2021-09-01_03:30:03-rados-octopus-distro-basic-gibba --description rados/cephadm/upgrade/{1-start-distro/1-start-centos_8 2-repo_digest/defaut 3-start-upgrade 4-wait fixed-2} -- /home/teuthworker/archive/teuthology-2021-09-01_03:30:03-rados-octopus-distro-basic-gibba/6369338/orig.config.yaml
2021-09-09T17:12:08.803 INFO:teuthology.dispatcher.supervisor:Job archive: /home/teuthworker/archive/teuthology-2021-09-01_03:30:03-rados-octopus-distro-basic-gibba/6369338
2021-09-09T17:12:08.804 INFO:teuthology.dispatcher.supervisor:Job PID: 20266
2021-09-09T17:12:08.804 INFO:teuthology.dispatcher.supervisor:Running with watchdog
2021-09-10T05:12:53.550 WARNING:teuthology.dispatcher.supervisor:Job ran longer than 43200s. Killing...
2021-09-10T05:12:53.761 INFO:teuthology.kill:Killing Pids: {20266}
2021-09-10T05:12:54.855 INFO:teuthology.kill:Nuking machines: ['gibba001', 'gibba041']
2021-09-10T05:12:55.711 INFO:teuthology.kill:2021-09-10 05:12:55,711.711 INFO:teuthology.nuke:targets:
2021-09-10T05:12:55.712 INFO:teuthology.kill:  gibba001.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQD7Dwz5eZsjA4lRqJ5lAnlTHpnVPHYhTsrPLQy6CKb8Eoc5+gX05oyvpeKzNpGBsXw+R/hlp/Ti0e6teKxrTupL+HIDE6GwanV2J2iwcdWzr9iA2tXuJ4egbfrNpLDJbQnkX59D6Ia6uyEceS2N32CDP4vkwkJ4P64+K20+G3a63fiYWcYGxEGWJi3sWILuxS8rOPLy55GOY6GOntGV3sWluDsxxrT1JrZPG1tKM1JtT7vZT9ofe7N1Jro/FyUdW8vp9coRShM4RHIea/mj0QCmnohhoO+pS2ZnmOJNTAdMudsV/z3Ru8d/n+0sS5VjHwiC/hutTNZMaTkOFYSsIR8j+rSbeX3AICU22wSTFlyVVDaKByOt2qhMLxY9vRJVaxSKyFg0OWniK7igo/TxBtvMvVhn2Vpizb6w9H5Z74XExwxBtQx+TKEJncfGai7nkyR9Yhf0xOz+4489Tx/Ad8XzHKrY7B32KkaIu9e7XrJBW9grjwsWGrvuL5b7OrE5lss=
2021-09-10T05:12:55.713 INFO:teuthology.kill:  gibba041.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDEcwXnlx05+pkQzUNij/F8XjjFK2w7q0rbnVzlOXaDW9fUbFHz+uWDlL+vR3LAZrqFLW7XXGazyyacjchn4+v9B8WlLT8pJysCnfukpQYNViL8tPU1+IK+mGG8pIzhKXVSEhsDHk9xvmwYBsq1JONz7Nw3hJrTCbEZD77XtA/JnOK1ksWwP1ZEOjZnQGmq4s7VPlUDb8HlH0dcd7IBXshWsTTDLk52y+CrJd/6VcgIeNDVsXXkzXwBwsKcQSvM20J3x0+ne8ETP6jFVa46eNXZUHcIwOyZ/6g1Sl0hasb6vOeiIwZafVte5H7an2GCrMbsGUkQ+nRizzH6fOZXRym3rlIDxUd8uKwCZpof9hrjkkhdmkb+cfwcdtTuPDEfZCaTKM8cBn7Wi46Enic0mdvR0AmnGGtwP47cCcLU4bPTqh54V3eSaFApNwnyWKY9iN/JSQ164GpbHCs26ixabDyNRUfGkdz44Qq9bOs0ZfFhghZ1YZD4gGTC7NshKsxChiE=
2021-09-10T05:12:55.713 INFO:teuthology.kill:2021-09-10 05:12:55,711.711 INFO:teuthology.task.internal.check_lock:Checking locks...
2021-09-10T05:12:55.715 INFO:teuthology.kill:2021-09-10 05:12:55,715.715 INFO:teuthology.task.internal.check_lock:Checking locks...
2021-09-10T05:12:55.775 INFO:teuthology.kill:2021-09-10 05:12:55,775.775 INFO:teuthology.orchestra.console:Power off gibba041
2021-09-10T05:12:55.788 INFO:teuthology.kill:2021-09-10 05:12:55,787.787 INFO:teuthology.orchestra.console:Power off gibba001
2021-09-10T05:13:08.280 INFO:teuthology.kill:2021-09-10 05:13:08,280.280 INFO:teuthology.orchestra.console:Power off for gibba001 completed
2021-09-10T05:13:08.285 INFO:teuthology.kill:2021-09-10 05:13:08,285.285 INFO:teuthology.orchestra.console:Power off for gibba041 completed
2021-09-10T05:13:08.474 INFO:teuthology.task.internal:roles: ubuntu@gibba001.front.sepia.ceph.com - ['mon.a', 'mon.c', 'mgr.y', 'osd.0', 'osd.1', 'osd.2', 'osd.3', 'client.0', 'node-exporter.a', 'alertmanager.a']
2021-09-10T05:13:08.475 INFO:teuthology.task.internal:roles: ubuntu@gibba041.front.sepia.ceph.com - ['mon.b', 'mgr.x', 'osd.4', 'osd.5', 'osd.6', 'osd.7', 'client.1', 'prometheus.a', 'grafana.a', 'node-exporter.b']
2021-09-10T05:13:08.475 INFO:teuthology.misc:Compressing logs...
2021-09-10T05:13:08.476 INFO:teuthology.orchestra.remote:Trying to reconnect to host
2021-09-10T05:14:08.524 ERROR:teuthology.dispatcher.supervisor:Could not save logs
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_df69d6db9d68f5492f47c1dee57bca3b0a385796/teuthology/dispatcher/supervisor.py", line 231, in run_with_watchdog
    teuth_config.archive_base, job_config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_df69d6db9d68f5492f47c1dee57bca3b0a385796/teuthology/dispatcher/supervisor.py", line 289, in transfer_archives
    compress_logs(ctx, log_path)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_df69d6db9d68f5492f47c1dee57bca3b0a385796/teuthology/misc.py", line 1378, in compress_logs
    wait=False,
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_df69d6db9d68f5492f47c1dee57bca3b0a385796/teuthology/orchestra/cluster.py", line 64, in run
    return [remote.run(**kwargs) for remote in remotes]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_df69d6db9d68f5492f47c1dee57bca3b0a385796/teuthology/orchestra/cluster.py", line 64, in <listcomp>
    return [remote.run(**kwargs) for remote in remotes]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_df69d6db9d68f5492f47c1dee57bca3b0a385796/teuthology/orchestra/remote.py", line 509, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_df69d6db9d68f5492f47c1dee57bca3b0a385796/teuthology/orchestra/run.py", line 434, in run
    transport = client.get_transport()
AttributeError: 'NoneType' object has no attribute 'get_transport'
2021-09-10T05:14:08.562 INFO:teuthology.kill:No teuthology processes running
2021-09-10T05:14:09.539 INFO:teuthology.kill:Nuking machines: ['gibba001', 'gibba041']
2021-09-10T05:14:10.376 INFO:teuthology.kill:2021-09-10 05:14:10,375.375 INFO:teuthology.nuke:targets:
2021-09-10T05:14:10.376 INFO:teuthology.kill:  gibba001.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQD7Dwz5eZsjA4lRqJ5lAnlTHpnVPHYhTsrPLQy6CKb8Eoc5+gX05oyvpeKzNpGBsXw+R/hlp/Ti0e6teKxrTupL+HIDE6GwanV2J2iwcdWzr9iA2tXuJ4egbfrNpLDJbQnkX59D6Ia6uyEceS2N32CDP4vkwkJ4P64+K20+G3a63fiYWcYGxEGWJi3sWILuxS8rOPLy55GOY6GOntGV3sWluDsxxrT1JrZPG1tKM1JtT7vZT9ofe7N1Jro/FyUdW8vp9coRShM4RHIea/mj0QCmnohhoO+pS2ZnmOJNTAdMudsV/z3Ru8d/n+0sS5VjHwiC/hutTNZMaTkOFYSsIR8j+rSbeX3AICU22wSTFlyVVDaKByOt2qhMLxY9vRJVaxSKyFg0OWniK7igo/TxBtvMvVhn2Vpizb6w9H5Z74XExwxBtQx+TKEJncfGai7nkyR9Yhf0xOz+4489Tx/Ad8XzHKrY7B32KkaIu9e7XrJBW9grjwsWGrvuL5b7OrE5lss=
2021-09-10T05:14:10.377 INFO:teuthology.kill:  gibba041.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDEcwXnlx05+pkQzUNij/F8XjjFK2w7q0rbnVzlOXaDW9fUbFHz+uWDlL+vR3LAZrqFLW7XXGazyyacjchn4+v9B8WlLT8pJysCnfukpQYNViL8tPU1+IK+mGG8pIzhKXVSEhsDHk9xvmwYBsq1JONz7Nw3hJrTCbEZD77XtA/JnOK1ksWwP1ZEOjZnQGmq4s7VPlUDb8HlH0dcd7IBXshWsTTDLk52y+CrJd/6VcgIeNDVsXXkzXwBwsKcQSvM20J3x0+ne8ETP6jFVa46eNXZUHcIwOyZ/6g1Sl0hasb6vOeiIwZafVte5H7an2GCrMbsGUkQ+nRizzH6fOZXRym3rlIDxUd8uKwCZpof9hrjkkhdmkb+cfwcdtTuPDEfZCaTKM8cBn7Wi46Enic0mdvR0AmnGGtwP47cCcLU4bPTqh54V3eSaFApNwnyWKY9iN/JSQ164GpbHCs26ixabDyNRUfGkdz44Qq9bOs0ZfFhghZ1YZD4gGTC7NshKsxChiE=
2021-09-10T05:14:10.396 INFO:teuthology.kill:2021-09-10 05:14:10,396.396 INFO:teuthology.task.internal.check_lock:Checking locks...
2021-09-10T05:14:10.400 INFO:teuthology.kill:2021-09-10 05:14:10,399.399 INFO:teuthology.task.internal.check_lock:Checking locks...
2021-09-10T05:14:10.456 INFO:teuthology.kill:2021-09-10 05:14:10,455.455 INFO:teuthology.orchestra.console:Power off gibba041
2021-09-10T05:14:10.466 INFO:teuthology.kill:2021-09-10 05:14:10,466.466 INFO:teuthology.orchestra.console:Power off gibba001
2021-09-10T05:14:10.602 INFO:teuthology.kill:2021-09-10 05:14:10,602.602 INFO:teuthology.orchestra.console:Power off for gibba041 completed
2021-09-10T05:14:10.611 INFO:teuthology.kill:2021-09-10 05:14:10,611.611 INFO:teuthology.orchestra.console:Power off for gibba001 completed
2021-09-10T05:14:10.741 INFO:teuthology.kill:2021-09-10 05:14:10,740.740 INFO:teuthology.lock.ops:unlocked gibba041.front.sepia.ceph.com
2021-09-10T05:14:10.747 INFO:teuthology.kill:2021-09-10 05:14:10,747.747 INFO:teuthology.lock.ops:unlocked gibba001.front.sepia.ceph.com
2021-09-10T05:16:10.954 ERROR:teuthology.dispatcher.supervisor:Child exited with code -15
2021-09-10T05:16:11.015 INFO:teuthology.dispatcher.supervisor:Nuking machines...
2021-09-10T05:16:11.016 INFO:teuthology.nuke:Checking targets against current locks
2021-09-10T05:16:11.093 INFO:teuthology.nuke:Not nuking gibba041.front.sepia.ceph.com because description doesn't match
2021-09-10T05:16:11.093 INFO:teuthology.nuke:Not nuking gibba001.front.sepia.ceph.com because description doesn't match
  1. The job starts nuking the machines (powers them off) before trying to collect logs.
  2. It says it's nuking the machines three times. Notice the last time, the supervisor is trying to nuke the machine after another job already picked it up.

#5 Updated by Zack Cerza about 1 month ago

This commit has good intentions, but since kill_job calls nuke_targets, it looks like this will happen any time a job is killed for going overtime.

I'll get a patch together today.

#7 Updated by Kamoltat Sirivadhna 9 days ago

  • Status changed from New to Resolved
  • Assignee changed from Kamoltat Sirivadhna to Zack Cerza

Also available in: Atom PDF