Project

General

Profile

Bug #9920

admin socket check hang, osd appears fine

Added by Yuri Weinstein over 9 years ago. Updated almost 7 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-10-27_17:18:01-upgrade:firefly-x-giant-distro-basic-vps/573855/

2014-10-27T19:29:34.901 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 119, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/thrashosds.py", line 174, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/src/ceph-qa-suite_giant/tasks/ceph_manager.py", line 285, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: timed out waiting for admin_socket to appear after osd.4 restart
archive_path: /var/lib/teuthworker/archive/teuthology-2014-10-27_17:18:01-upgrade:firefly-x-giant-distro-basic-vps/573855
branch: giant
description: upgrade:firefly-x/stress-split/{0-cluster/start.yaml 1-firefly-install/firefly.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/{rbd-cls.yaml
  rbd-import-export.yaml readwrite.yaml snaps-few-objects.yaml} 6-next-mon/monb.yaml
  7-workload/{radosbench.yaml rbd_api.yaml} 8-next-mon/monc.yaml 9-workload/{rbd-python.yaml
  rgw-swift.yaml snaps-many-objects.yaml} distros/ubuntu_12.04.yaml}
email: ceph-qa@ceph.com
job_id: '573855'
kernel: &id001
  kdb: true
  sha1: distro
last_in_suite: false
machine_type: vps
name: teuthology-2014-10-27_17:18:01-upgrade:firefly-x-giant-distro-basic-vps
nuke-on-error: true
os_type: ubuntu
os_version: '12.04'
overrides:
  admin_socket:
    branch: giant
  ceph:
    conf:
      global:
        osd heartbeat grace: 100
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    - objects unfound and apparently lost
    - log bound mismatch
    sha1: 490ae489aefbd85011673ae503b39a232fbf5183
  ceph-deploy:
    branch:
      dev: giant
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: 490ae489aefbd85011673ae503b39a232fbf5183
  rgw:
    default_idle_timeout: 1200
  s3tests:
    branch: giant
    idle_timeout: 1200
  workunit:
    sha1: 490ae489aefbd85011673ae503b39a232fbf5183
owner: scheduled_teuthology@teuthology
priority: 100
roles:
- - mon.a
  - mon.b
  - mon.c
  - mds.a
  - osd.0
  - osd.1
  - osd.2
  - osd.3
  - osd.4
  - osd.5
  - osd.6
- - osd.7
  - osd.8
  - osd.9
  - osd.10
  - osd.11
  - osd.12
  - osd.13
- - client.0
suite: upgrade:firefly-x
suite_branch: giant
suite_path: /var/lib/teuthworker/src/ceph-qa-suite_giant
targets:
  ubuntu@vpm016.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDpVa95hdqw6mBshVDEQ/itldQsWDF/8a7Hc3ZvA6u4zsvkuVKFErNVBH7NBY/oB2V7L+znDCMsaYogfoo5yDefXURnNtChcWkwSPFsUy98sPTOPIgvsRbPoGgj+H9D3Y0AYXFsHBz989gQ6kcPQ8AhxpoyQqMbkQN8/70b5iY0cF522a3MfaiDuFnLiK6ejmoIgBTAoZhf2aT30pBlFjTDb6uXqPprSvIlkJ40g0YkXOPJxSYIbkyRHn0DLNRix1454smPdXhM82CNuZ4nSAuSkMZUeiC+fr1Qn7HJ5wfY3IL7aj+A1jjeixodZmrxgiXh4DY8bpOIaev80M3nirMz
  ubuntu@vpm054.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDkGCC6uWWwqK2kpyzWU4APfCDFXfod+w0W4SmIM1qjHjdS1nudfNWLCgKGftG5savVrq6UEwINJHJZHZhSOcgxhMvNeWOHUoWQaANN5rWHwmsTEZKoB811/M8B9+XfEvPNa1qzf1mA4wvRtAIMXd1A4FaOdhcyESW189Z6SyI1NMBSjnhphD4qLTqnrbAtQc5PDn3JTafk9Jg3F0n9fsE6Ubl0JG5Q8UnL5xNy2lj0U8xKyboLqzBTiR7siRYHyxbmHnRNBln74g8yC2CHHdmntLVBq1m/8u9fHvvmVVbrnfG7HMipkcbrRwFWEiQU9D/EE2whQCFog0cI6Flw5dn9
  ubuntu@vpm105.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDAwAt/Mklt8SvwXPTJSJKON2XnQh9YJsQ17EICGsqmVYOvrv0Z0x3npUNHo9n3tea9WSSKBiisnEF0SFm9ot+T3y3Ke06MFfzuiVNMDM2iqjmR2V6v8IPd0Z19Qd49EsgFN2cMgnKbiPjMW+NHRXPJhOUenABwkP7iuidunM1zkSeaYOVRnGF1VtclTwTNQp+Bbqsxov4PGHNB96nMV7ZmFEwyod6/jY/pKfUM05sPKaubYESNEn2TkiZY5DWN00l8rjrN/61M6ISLNo+d3kNJ8EI6EaklJCZnEAUOLQ5OoefUZMBPuVelzuDFyhyr5Yr4eGjTkrc8dAkzUk5S+TJ/
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.push_inventory: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: firefly
- ceph:
    fs: xfs
- install.upgrade:
    osd.0: null
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
    - osd.3
    - osd.4
    - osd.5
    - osd.6
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    branch: firefly
    clients:
      client.0:
      - cls/test_cls_rbd.sh
- workunit:
    branch: firefly
    clients:
      client.0:
      - rbd/import_export.sh
    env:
      RBD_CREATE_ARGS: --new-format
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 10
      read: 45
      write: 45
    ops: 4000
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- radosbench:
    clients:
    - client.0
    time: 1800
- workunit:
    branch: firefly
    clients:
      client.0:
      - rbd/test_librbd.sh
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    branch: firefly
    clients:
      client.0:
      - rbd/test_librbd_python.sh
- rgw:
    client.0: null
    default_idle_timeout: 300
- swift:
    client.0:
      rgw_server: client.0
- rados:
    clients:
    - client.0
    objects: 500
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
teuthology_branch: master
tube: vps
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.3044

description: upgrade:firefly-x/stress-split/{0-cluster/start.yaml 1-firefly-install/firefly.yaml
  2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/{rbd-cls.yaml
  rbd-import-export.yaml readwrite.yaml snaps-few-objects.yaml} 6-next-mon/monb.yaml
  7-workload/{radosbench.yaml rbd_api.yaml} 8-next-mon/monc.yaml 9-workload/{rbd-python.yaml
  rgw-swift.yaml snaps-many-objects.yaml} distros/ubuntu_12.04.yaml}
duration: 7539.997278928757
failure_reason: timed out waiting for admin_socket to appear after osd.4 restart
flavor: basic
owner: scheduled_teuthology@teuthology
status: fail
success: false

History

#1 Updated by Samuel Just over 9 years ago

  • Subject changed from Error "timed out" after "joining thrashosds" in upgrade:firefly-x-giant-distro-basic-vps run to admin socket check hang, osd appears fine
  • Priority changed from Urgent to High

Hmm, osd.4 seems fine, not sure why the admin socket check didn't work.

#2 Updated by Yuri Weinstein over 9 years ago

Same issue in run http://qa-proxy.ceph.com/teuthology/teuthology-2014-11-22_17:18:02-upgrade:firefly-x-next-distro-basic-vps/615234/teuthology.log
Job ['615234']

2014-11-23T00:14:46.235 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 119, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/thrashosds.py", line 174, in task
    thrash_proc.do_join()
  File "/var/lib/teuthworker/src/ceph-qa-suite_next/tasks/ceph_manager.py", line 288, in do_join
    self.thread.get()
  File "/usr/lib/python2.7/dist-packages/gevent/greenlet.py", line 308, in get
    raise self._exception
Exception: timed out waiting for admin_socket to appear after osd.4 restart

#3 Updated by Sage Weil almost 7 years ago

  • Status changed from New to Can't reproduce

this was probably teh dlopen vs tcmalloc init race (fixed upstream)

Also available in: Atom PDF