Bug #18575
Queued job fails to start then kills worker
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
I noticed the number of workers drop on the Grafana dashboard. I correlated the time the worker count decreased to the mtime of this worker log.
http://pulpito.ceph.com/jdillaman-2017-01-16_15:34:13-rbd-wip-jd-testing-distro-basic-smithi/722896/
root@teuthology:/ceph/teuthology-archive/worker_logs# cat worker.smithi.11960 2017-01-17T00:31:44.649 INFO:root:teuthology version: 1.0.0-ed1a1e9 2017-01-17T00:31:44.714 INFO:teuthology.repo_utils:/home/teuthworker/src/git.ceph.com_git_teuthology_master was just updated; assuming it is current 2017-01-17T00:31:44.714 INFO:teuthology.repo_utils:Resetting repo at /home/teuthworker/src/git.ceph.com_git_teuthology_master to branch master 2017-01-17T00:31:44.746 INFO:teuthology.repo_utils:Skipping bootstrap as it was already done in the last 60s 2017-01-17T00:31:49.541 INFO:teuthology.repo_utils:/home/teuthworker/src/git.ceph.com_ceph_master was just updated; assuming it is current 2017-01-17T00:31:49.541 INFO:teuthology.repo_utils:Resetting repo at /home/teuthworker/src/git.ceph.com_ceph_master to branch master 2017-01-17T00:31:49.694 INFO:teuthology.worker:Reserved job 722896 2017-01-17T00:31:49.694 INFO:teuthology.worker:Config is: branch: wip-jd-testing description: rbd/nbd/{base/install.yaml cluster/{fixed-3.yaml openstack.yaml} fs/xfs.yaml msgr-failures/few.yaml objectstore/bluestore.yaml thrashers/default.yaml workloads/rbd_nbd.yaml} email: dillaman@redhat.com kernel: {kdb: true, sha1: distro} last_in_suite: false machine_type: smithi name: jdillaman-2017-01-16_15:34:13-rbd-wip-jd-testing-distro-basic-smithi nuke-on-error: true openstack: - machine: {cpus: 1, disk: 40, ram: 8000} volumes: {count: 3, size: 30} os_type: ubuntu overrides: admin_socket: {branch: wip-jd-testing} ceph: conf: global: {ms inject socket failures: 5000} mon: {debug mon: 20, debug ms: 1, debug paxos: 20} osd: {bluestore block size: 96636764160, debug bdev: 20, debug bluefs: 20, debug bluestore: 30, debug filestore: 20, debug journal: 20, debug ms: 1, debug osd: 25, debug rocksdb: 10, enable experimental unrecoverable data corrupting features: '*', osd debug randomize hobject sort order: false, osd objectstore: bluestore, osd sloppy crc: true} fs: xfs log-whitelist: [slow request, wrongly marked me down, objects unfound and apparently lost] sha1: 7c4bd464baf7dbd4f1f1c6bdfee0bae664479727 ceph-deploy: conf: client: {log file: /var/log/ceph/ceph-$name.$pid.log} mon: {debug mon: 1, debug ms: 20, debug paxos: 20, osd default pool size: 2} install: ceph: extra_packages: [rbd-nbd] sha1: 7c4bd464baf7dbd4f1f1c6bdfee0bae664479727 thrashosds: {bdev_inject_crash: 2, bdev_inject_crash_probability: 0.5} workunit: {sha1: 7c4bd464baf7dbd4f1f1c6bdfee0bae664479727} owner: jdillaman priority: 100 repo: git://git.ceph.com/ceph-ci.git roles: - [mon.a, mon.c, osd.0, osd.1, osd.2] - [mon.b, osd.3, osd.4, osd.5] - [client.0] sha1: 7c4bd464baf7dbd4f1f1c6bdfee0bae664479727 suite: rbd suite_branch: wip-jd-testing suite_relpath: qa suite_repo: git://git.ceph.com/ceph-ci.git suite_sha1: 7c4bd464baf7dbd4f1f1c6bdfee0bae664479727 tasks: - {install: null} - {ceph: null} - thrashosds: {timeout: 1200} - workunit: clients: client.0: [rbd/rbd-nbd.sh] teuthology_branch: master tube: smithi verbose: false 2017-01-17T00:31:49.724 INFO:teuthology.repo_utils:/home/teuthworker/src/git.ceph.com_git_teuthology_master was just updated; assuming it is current 2017-01-17T00:31:49.724 INFO:teuthology.repo_utils:Resetting repo at /home/teuthworker/src/git.ceph.com_git_teuthology_master to branch master 2017-01-17T00:31:49.747 INFO:teuthology.repo_utils:Skipping bootstrap as it was already done in the last 60s 2017-01-17T00:31:55.440 INFO:teuthology.repo_utils:/home/teuthworker/src/git.ceph.com_ceph-c_wip-jd-testing was just updated; assuming it is current 2017-01-17T00:31:55.440 INFO:teuthology.repo_utils:Resetting repo at /home/teuthworker/src/git.ceph.com_ceph-c_wip-jd-testing to branch wip-jd-testing 2017-01-17T00:31:55.469 INFO:teuthology.worker:Creating archive dir /home/teuthworker/archive/jdillaman-2017-01-16_15:34:13-rbd-wip-jd-testing-distro-basic-smithi/722896 2017-01-17T00:31:56.438 INFO:teuthology.worker:Running job 722896 2017-01-17T00:31:56.461 DEBUG:teuthology.worker:Running: /home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/bin/teuthology -v --lock --block --owner jdillaman --archive /home/teuthworker/archive/jdillaman-2017-01-16_15:34:13-rbd-wip-jd-testing-distro-basic-smithi/722896 --name jdillaman-2017-01-16_15:34:13-rbd-wip-jd-testing-distro-basic-smithi --description rbd/nbd/{base/install.yaml cluster/{fixed-3.yaml openstack.yaml} fs/xfs.yaml msgr-failures/few.yaml objectstore/bluestore.yaml thrashers/default.yaml workloads/rbd_nbd.yaml} -- /tmp/teuthology-worker.gJQo_L.tmp 2017-01-17T00:31:56.510 INFO:teuthology.worker:Job archive: /home/teuthworker/archive/jdillaman-2017-01-16_15:34:13-rbd-wip-jd-testing-distro-basic-smithi/722896 2017-01-17T00:31:56.510 INFO:teuthology.worker:Job PID: 14637 2017-01-17T00:31:56.510 INFO:teuthology.worker:Running with watchdog 2017-01-17T00:33:56.519 DEBUG:teuthology.worker:Worker log: /home/teuthworker/archive/worker_logs/worker.smithi.11960 2017-01-17T12:32:18.964 WARNING:teuthology.worker:Job ran longer than 43200s. Killing... 2017-01-17T12:32:19.022 ERROR:teuthology.worker:run_with_watchdog had an unhandled exception Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_teuthology_master/teuthology/worker.py", line 279, in run_job run_with_watchdog(p, job_config) File "/home/teuthworker/src/git.ceph.com_teuthology_master/teuthology/worker.py", line 319, in run_with_watchdog teuth_config.archive_base) File "/home/teuthworker/src/git.ceph.com_teuthology_master/teuthology/kill.py", line 71, in kill_job "I could not figure out the owner of the requested job. " RuntimeError: I could not figure out the owner of the requested job. Please pass --owner <owner>. 2017-01-17T12:32:19.082 CRITICAL:teuthology.worker:Uncaught exception Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_teuthology_master/virtualenv/bin/teuthology-worker", line 11, in <module> load_entry_point('teuthology', 'console_scripts', 'teuthology-worker')() File "/home/teuthworker/src/git.ceph.com_teuthology_master/scripts/worker.py", line 7, in main teuthology.worker.main(parse_args()) File "/home/teuthworker/src/git.ceph.com_teuthology_master/teuthology/worker.py", line 139, in main ctx.verbose, File "/home/teuthworker/src/git.ceph.com_teuthology_master/teuthology/worker.py", line 279, in run_job run_with_watchdog(p, job_config) File "/home/teuthworker/src/git.ceph.com_teuthology_master/teuthology/worker.py", line 319, in run_with_watchdog teuth_config.archive_base) File "/home/teuthworker/src/git.ceph.com_teuthology_master/teuthology/kill.py", line 71, in kill_job "I could not figure out the owner of the requested job. " RuntimeError: I could not figure out the owner of the requested job. Please pass --owner <owner>.
History
#1 Updated by David Galloway about 7 years ago
I touched /home/teuthworker/archive/jdillaman-2017-01-16_15:34:13-rbd-wip-jd-testing-distro-basic-smithi/722896/.preserve
to keep the dir.
#2 Updated by David Galloway about 7 years ago
May be a residual failure from this problem: http://tracker.ceph.com/issues/18482
Will leave this ticket open for a day or two to see if more workers die.
#3 Updated by David Galloway about 7 years ago
Found some more
teuthology-2017-01-16_11:00:03-rbd-kraken-distro-basic-smithi/721193
2017-01-18T19:35:33.385 INFO:teuthology.repo_utils:/home/teuthworker/src/git.ceph.com_git_teuthology_master was just updated; assuming it is current 2017-01-18T19:35:33.386 INFO:teuthology.repo_utils:Resetting repo at /home/teuthworker/src/git.ceph.com_git_teuthology_master to branch master 2017-01-18T19:35:33.426 INFO:teuthology.repo_utils:Skipping bootstrap as it was already done in the last 60s 2017-01-18T19:35:36.786 INFO:teuthology.repo_utils:/home/teuthworker/src/git.ceph.com_ceph_kraken was just updated; assuming it is current 2017-01-18T19:35:36.786 INFO:teuthology.repo_utils:Resetting repo at /home/teuthworker/src/git.ceph.com_ceph_kraken to branch kraken 2017-01-18T19:35:36.900 INFO:teuthology.worker:Creating archive dir /home/teuthworker/archive/teuthology-2017-01-16_11:00:03-rbd-kraken-distro-basic-smithi/721193 2017-01-18T19:35:36.901 INFO:teuthology.worker:Running job 721193 2017-01-18T19:35:36.931 DEBUG:teuthology.worker:Running: /home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/bin/teuthology -v --lock --block --owner scheduled_teuthology@teuthology --archive /home/teuthworker/archive/teuthology-2017-01-16_11:00:03-rbd-kraken-distro-basic-smithi/721193 --name teuthology-2017-01-16_11:00:03-rbd-kraken-distro-basic-smithi --description rbd/librbd/{cache/writeback.yaml clusters/{fixed-3.yaml openstack.yaml} config/none.yaml fs/xfs.yaml msgr-failures/few.yaml objectstore/filestore.yaml pool/replicated-data-pool.yaml workloads/c_api_tests.yaml} -- /tmp/teuthology-worker.v8AWpK.tmp 2017-01-18T19:35:36.966 INFO:teuthology.worker:Job archive: /home/teuthworker/archive/teuthology-2017-01-16_11:00:03-rbd-kraken-distro-basic-smithi/721193 2017-01-18T19:35:36.967 INFO:teuthology.worker:Job PID: 29519 2017-01-18T19:35:36.968 INFO:teuthology.worker:Running with watchdog 2017-01-18T19:37:36.969 DEBUG:teuthology.worker:Worker log: /home/teuthworker/archive/worker_logs/worker.smithi.13184 2017-01-19T07:35:53.441 WARNING:teuthology.worker:Job ran longer than 43200s. Killing... 2017-01-19T07:35:53.635 ERROR:teuthology.worker:run_with_watchdog had an unhandled exception Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_teuthology_master/teuthology/worker.py", line 279, in run_job run_with_watchdog(p, job_config) File "/home/teuthworker/src/git.ceph.com_teuthology_master/teuthology/worker.py", line 319, in run_with_watchdog teuth_config.archive_base) File "/home/teuthworker/src/git.ceph.com_teuthology_master/teuthology/kill.py", line 71, in kill_job "I could not figure out the owner of the requested job. " RuntimeError: I could not figure out the owner of the requested job. Please pass --owner <owner>. 2017-01-19T07:35:53.903 CRITICAL:teuthology.worker:Uncaught exception Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_teuthology_master/virtualenv/bin/teuthology-worker", line 11, in <module> load_entry_point('teuthology', 'console_scripts', 'teuthology-worker')() File "/home/teuthworker/src/git.ceph.com_teuthology_master/scripts/worker.py", line 7, in main teuthology.worker.main(parse_args()) File "/home/teuthworker/src/git.ceph.com_teuthology_master/teuthology/worker.py", line 139, in main ctx.verbose, File "/home/teuthworker/src/git.ceph.com_teuthology_master/teuthology/worker.py", line 279, in run_job run_with_watchdog(p, job_config) File "/home/teuthworker/src/git.ceph.com_teuthology_master/teuthology/worker.py", line 319, in run_with_watchdog teuth_config.archive_base) File "/home/teuthworker/src/git.ceph.com_teuthology_master/teuthology/kill.py", line 71, in kill_job "I could not figure out the owner of the requested job. " RuntimeError: I could not figure out the owner of the requested job. Please pass --owner <owner>.
jdillaman-2017-01-18_08:34:20-rbd-wip-jd-testing-distro-basic-smithi/728120
#4 Updated by Zack Cerza about 7 years ago
- Status changed from New to Resolved