Project

General

Profile

Actions

Bug #16142

closed

Exception during internal.connect fails to unlock machines

Added by John Spray almost 8 years ago. Updated almost 6 years ago.

Status:
Won't Fix
Priority:
Immediate
Assignee:
Category:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Symptom is that jobs which have failed still have locked nodes.

Here's one:
http://qa-proxy.ceph.com/teuthology/yuriw-2016-06-02_11:43:49-rados-wip-yuri-testing-distro-basic-smithi/230564/worker.log

The machine was still locked:

    {
        "is_vm": false, 
        "locked": true, 
        "locked_since": "2016-06-03 00:22:35.009101", 
        "locked_by": "scheduled_yuriw@teuthology", 
        "up": true, 
        "mac_address": null, 
        "name": "smithi055.front.sepia.ceph.com", 
        "os_version": "14.04", 
        "machine_type": "smithi", 
        "vm_host": null, 
        "os_type": "ubuntu", 
        "arch": "x86_64", 
        "ssh_pub_key": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCcS0/jTSprtfXdi+1HQmxsNIkMGuOTkCjfl7ETuuuGGXBc4aO4C9p4fibGrsxQdtdZ4rZF6q4yzrZeoBC+54f9QimIy+amq612yXWZNelXMKQBNM3gcnZaw1YPdk2zBq0OP/rv3o+WP2CjNpSD3Izev9DVIavDv1S4s9nOfIpJvGE/n93f9tA+pAOXhd7MiPvPXns+rByX4UZmtvpXIsDMOimGo/b9La7asXvjx4eikFz2oCd+1s07dAmvRm0NyttjkNduDD3ewXVbBf8046P6cZOCPVe4tihHug96MwvmEfXw5pDd6AKBIx78bhrUEej/871ybYHLXpiZB130HOPB", 
        "description": "/var/lib/teuthworker/archive/yuriw-2016-06-02_11:43:49-rados-wip-yuri-testing-distro-basic-smithi/230564" 
    }, 

The worker had gone on to the next task:

2016-06-02T14:29:26.251 INFO:teuthology.worker:Creating archive dir /var/lib/teuthworker/archive/yuriw-2016-06-02_11:43:49-rados-wip-yuri-testing-distro-basic-smithi/230564
2016-06-02T14:29:26.252 INFO:teuthology.worker:Running job 230564
2016-06-02T14:29:26.271 DEBUG:teuthology.worker:Running: /var/lib/teuthworker/src/teuthology_master/virtualenv/bin/teuthology -v --lock --block --owner scheduled_yuriw@teuthology --archive /var/lib/teuthworker/archive/yuriw-2016-06-02_11:43:49-rados-wip-yuri-testing-distro-basic-smithi/230564 --name yuriw-2016-06-02_11:43:49-rados-wip-yuri-testing-distro-basic-smithi --description rados/thrash/{hobj-sort.yaml rados.yaml rocksdb.yaml 0-size-min-size-overrides/2-size-2-min-size.yaml 1-pg-log-overrides/normal_pg_log.yaml clusters/{fixed-2.yaml openstack.yaml} fs/xfs.yaml msgr/random.yaml msgr-failures/fastclose.yaml thrashers/pggrow.yaml workloads/rados_api_tests.yaml} -- /tmp/teuthology-worker.AOpNa2.tmp
2016-06-02T14:29:26.275 INFO:teuthology.worker:Job archive: /var/lib/teuthworker/archive/yuriw-2016-06-02_11:43:49-rados-wip-yuri-testing-distro-basic-smithi/230564
2016-06-02T14:29:26.276 INFO:teuthology.worker:Job PID: 23645
2016-06-02T14:29:26.276 INFO:teuthology.worker:Running with watchdog
2016-06-02T14:31:26.277 DEBUG:teuthology.worker:Worker log: /var/lib/teuthworker/archive/worker_logs/worker.smithi.29062
2016-06-02T17:23:32.616 ERROR:teuthology.worker:Child exited with code 1
2016-06-02T17:23:32.620 INFO:teuthology.worker:Reserved job 230618
2016-06-02T17:23:32.620 INFO:teuthology.worker:Config is: branch: wip-yuri-testing
description: rados/thrash/{hobj-sort.yaml rados.yaml rocksdb.yaml 0-size-min-size-overrides/2-size-2-min-size.yaml
  1-pg-log-overrides/short_pg_log.yaml clusters/{fixed-2.yaml openstack.yaml} fs/xfs.yaml
  msgr/random.yaml msgr-failures/fastclose.yaml thrashers/default.yaml workloads/rgw_snaps.yaml}
email: null
kernel: {kdb: true, sha1: distro}
last_in_suite: false
machine_type: smithi
name: yuriw-2016-06-02_11:43:49-rados-wip-yuri-testing-distro-basic-smithi
nuke-on-error: true
openstack:
- volumes: {count: 3, size: 30}
overrides:
  admin_socket: {branch: wip-yuri-testing}
  ceph:
    conf:
      client: {debug ms: 1, debug rgw: 20}
      global: {enable experimental unrecoverable data corrupting features: '*', ms inject socket failures: 2500,
        ms tcp read timeout: 5, ms type: random, osd_max_pg_log_entries: 300, osd_min_pg_log_entries: 150,
        osd_pool_default_min_size: 2, osd_pool_default_size: 2}
      mon: {debug mon: 20, debug ms: 1, debug paxos: 20, mon keyvaluedb: rocksdb}
      osd: {debug filestore: 20, debug journal: 20, debug ms: 1, debug osd: 25, osd debug randomize hobject sort order: true,
        osd op queue: debug_random, osd op queue cut off: debug_random, osd sloppy crc: true}
    fs: xfs
    log-whitelist: [slow request]
    sha1: 22ad94cd61ee7714da6b3c851967d1b6e44ae6c1
  ceph-deploy:
    branch: {dev-commit: 22ad94cd61ee7714da6b3c851967d1b6e44ae6c1}
    conf:
      client: {log file: /var/log/ceph/ceph-$name.$pid.log}
      mon: {debug mon: 1, debug ms: 20, debug paxos: 20, osd default pool size: 2}
  install:
    ceph: {sha1: 22ad94cd61ee7714da6b3c851967d1b6e44ae6c1}
  workunit: {sha1: 22ad94cd61ee7714da6b3c851967d1b6e44ae6c1}
owner: scheduled_yuriw@teuthology
priority: 100
roles:
- [mon.a, mon.c, osd.0, osd.1, osd.2, client.0]
- [mon.b, osd.3, osd.4, osd.5, client.1]
sha1: 22ad94cd61ee7714da6b3c851967d1b6e44ae6c1
suite: rados
suite_branch: master
suite_sha1: bd14a8e13b94b7b50ec060e438d8d7096ae78aeb
tasks:
- {install: null}
- ceph:
    conf:
      osd: {osd debug reject backfill probability: 0.3, osd max backfills: 1, osd scrub max interval: 120,
        osd scrub min interval: 60}
    log-whitelist: [wrongly marked me down, objects unfound and apparently lost]
- thrashosds: {chance_pgnum_grow: 1, chance_pgpnum_fix: 1, timeout: 1200}
- rgw: {client.0: null, default_idle_timeout: 3600}
- thrash_pool_snaps:
    pools: [.rgw.buckets, .rgw.root, .rgw.control, .rgw, .users.uid, .users.email,
      .users]
- s3readwrite:
    client.0:
      readwrite:
        bucket: rwtest
        duration: 300
        files: {num: 10, size: 2000, stddev: 500}
        readers: 10
        writers: 3
      rgw_server: client.0
teuthology_branch: master
tube: smithi
verbose: false

2016-06-02T17:23:32.652 INFO:teuthology.repo_utils:Fetching from upstream into /var/lib/teuthworker/src/teuthology_master
2016-06-02T17:23:32.696 INFO:teuthology.repo_utils:Resetting repo at /var/lib/teuthworker/src/teuthology_master to branch master
2016-06-02T17:23:32.706 INFO:teuthology.repo_utils:Bootstrapping /var/lib/teuthworker/src/teuthology_master
2016-06-02T17:23:43.105 INFO:teuthology.repo_utils:Bootstrap exited with status 0
2016-06-02T17:23:43.115 INFO:teuthology.repo_utils:Fetching from upstream into /var/lib/teuthworker/src/ceph-qa-suite_master
2016-06-02T17:23:43.206 INFO:teuthology.repo_utils:Resetting repo at /var/lib/teuthworker/src/ceph-qa-suite_master to branch master
2016-06-02T17:23:43.226 INFO:teuthology.worker:Creating archive dir /var/lib/teuthworker/archive/yuriw-2016-06-02_11:43:49-rados-wip-yuri-testing-distro-basic-smithi/230618
2016-06-02T17:23:43.226 INFO:teuthology.worker:Running job 230618

Actions

Also available in: Atom PDF