Project

General

Profile

Bug #9093

Hung jobs with worker exceptions

Added by John Spray over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Several dead jobs in this suite:
http://pulpito.ceph.com/teuthology-2014-08-11_23:04:01-fs-master-testing-basic-multi/

The log.exception() one is a typo from https://github.com/ceph/teuthology/commit/b25b095ff39caa1cab8af796b5a129973f02784d#diff-d41d8cd98f00b204e9800998ecf8427e

The others could be because something was being done to the machine at the time?

2014-08-12T09:43:02.553 INFO:teuthology.worker:Running job 418351
2014-08-12T09:43:02.567 INFO:teuthology.worker:Job archive: /var/lib/teuthworker/archive/teuthology-2014-08-11_23:08:01-kcephfs-master-testing-basic-multi/418351
2014-08-12T09:43:02.567 INFO:teuthology.worker:Job PID: 7518
2014-08-12T09:43:02.568 INFO:teuthology.worker:Running with watchdog
2014-08-12T09:45:02.668 DEBUG:teuthology.worker:Worker log: /var/lib/teuthworker/archive/worker_logs/worker.multi.16217
2014-08-12T09:45:02.721 ERROR:teuthology.worker:Failed to symlink worker log
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/worker.py", line 292, in symlink_worker_log
    os.symlink(worker_log_path, os.path.join(archive_dir, 'worker.log'))
OSError: [Errno 2] No such file or directory
2014-08-12T09:45:02.875 ERROR:teuthology.worker:Child exited with code 1
2014-08-12T09:45:02.876 INFO:teuthology.worker:Restarting...
2014-08-12T09:45:18.709 CRITICAL:teuthology.worker:Uncaught exception
Traceback (most recent call last):
  File "/var/lib/teuthworker/src/teuthology_master/virtualenv/bin/teuthology-worker", line 9, in <module>
    load_entry_point('teuthology==0.1.0', 'console_scripts', 'teuthology-worker')()
  File "/home/teuthworker/src/teuthology_master/scripts/worker.py", line 7, in main
    teuthology.worker.main(parse_args())
  File "/home/teuthworker/src/teuthology_master/teuthology/worker.py", line 83, in main
    fetch_qa_suite('master')
  File "/home/teuthworker/src/teuthology_master/teuthology/repo_utils.py", line 174, in fetch_qa_suite
    log.exception()
TypeError: exception() takes at least 2 arguments (1 given)
2014-08-12T12:33:32.547 INFO:teuthology.worker:Running job 418473
2014-08-12T12:33:32.563 CRITICAL:teuthology.worker:Uncaught exception
Traceback (most recent call last):
  File "/var/lib/teuthworker/src/teuthology_master/virtualenv/bin/teuthology-worker", line 9, in <module>
  File "/home/teuthworker/src/teuthology_master/scripts/worker.py", line 7, in main
    teuthology.worker.main(parse_args())
  File "/home/teuthworker/src/teuthology_master/teuthology/worker.py", line 165, in main
    run_job(job_config, teuth_bin_path)
  File "/home/teuthworker/src/teuthology_master/teuthology/worker.py", line 263, in run_job
    p = subprocess.Popen(args=arg, env=env)
  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1245, in _execute_child
    child_exception = pickle.loads(data)
  File "/usr/lib/python2.7/pickle.py", line 1382, in loads
    return Unpickler(file).load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 971, in load_string
    self.append(rep.decode("string-escape"))
LookupError: unknown encoding: string-escape
2014-08-12T09:13:16.776 INFO:teuthology.worker:Running job 418321
2014-08-12T09:13:16.791 INFO:teuthology.worker:Job archive: /var/lib/teuthworker/archive/teuthology-2014-08-11_23:06:02-krbd-master-testing-basic-multi/418321
2014-08-12T09:13:16.791 INFO:teuthology.worker:Job PID: 27695
2014-08-12T09:13:16.791 INFO:teuthology.worker:Running with watchdog
2014-08-12T09:15:16.891 DEBUG:teuthology.worker:Worker log: /var/lib/teuthworker/archive/worker_logs/worker.multi.2214
2014-08-12T14:51:46.128 ERROR:teuthology.worker:Child exited with code 1
2014-08-12T14:51:46.255 INFO:teuthology.worker:Restarting...
2014-08-12T14:51:46.255 CRITICAL:teuthology.worker:Uncaught exception
Traceback (most recent call last):
  File "/var/lib/teuthworker/src/teuthology_master/virtualenv/bin/teuthology-worker", line 9, in <module>
  File "/home/teuthworker/src/teuthology_master/scripts/worker.py", line 7, in main
    teuthology.worker.main(parse_args())
  File "/home/teuthworker/src/teuthology_master/teuthology/worker.py", line 94, in main
    restart()
  File "/home/teuthworker/src/teuthology_master/teuthology/worker.py", line 40, in restart
    os.execv(sys.executable, args)
OSError: [Errno 2] No such file or directory

Related issues

Related to teuthology - Bug #43866: luminous: qa: LookupError: unknown encoding: string-escape New

History

#1 Updated by John Spray over 9 years ago

I notice the log.exception() one is already fixed in 663060dd5adcbacd8fdbdffadedd088d45847535

#2 Updated by Zack Cerza over 9 years ago

The execv one is #9086.

The pickle one is pretty odd. I saw it once before, but am crossing my fingers it was a side-effect of something else. Are we still seeing it?

#3 Updated by Sage Weil over 9 years ago

  • Status changed from New to Resolved

don't think we've seen this

#4 Updated by Patrick Donnelly about 4 years ago

  • Related to Bug #43866: luminous: qa: LookupError: unknown encoding: string-escape added

Also available in: Atom PDF