Project

General

Profile

Bug #8959

osd crashed in upgrade:dumpling-x-firefly---basic-vps suite

Added by Yuri Weinstein over 9 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-07-28_11:48:15-upgrade:dumpling-x-firefly---basic-vps/382697/

Coredump from ubuntu@teuthology:/a/teuthology-2014-07-28_11:48:15-upgrade:/log/ceph-osd.1.log.gz

30045620-2014-07-29 00:54:10.530978 7fc31b33c700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7fc31b33c700 time 2014-07-29 00:54:09.453159
30045833:common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
30045903-
30045904- ceph version 0.67.9-21-g8649cbb (8649cbbc96a4de9de169b0203f35e0ac6c36a2ef)
30045980- 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x12b) [0x82837b]
30046076- 2: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, long, long)+0x90) [0x8289a0]
30046171- 3: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x548) [0x698968]
30046290- 4: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x16) [0x6d3fb6]
30046405- 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0x84da51]
30046472- 6: (ThreadPool::WorkThread::entry()+0x10) [0x850a80]
30046526- 7: (()+0x79d1) [0x7fc3335279d1]
30046559- 8: (clone()+0x6d) [0x7fc331dedb6d]
30046595- NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
30046688-
2014-07-28T22:01:43.345 DEBUG:teuthology.orchestra.run:Running [10.214.138.95]: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f -i a'
2014-07-28T22:01:43.426 INFO:teuthology.task.ceph.mds.a:Started
2014-07-28T22:01:43.426 DEBUG:teuthology.task.ceph.osd.1:waiting for process to exit
2014-07-28T22:01:43.426 ERROR:teuthology.parallel:Exception in parallel execution
Traceback (most recent call last):
  File "/home/teuthworker/src/teuthology_firefly/teuthology/parallel.py", line 82, in __exit__
    for result in self:
  File "/home/teuthworker/src/teuthology_firefly/teuthology/parallel.py", line 101, in next
    resurrect_traceback(result)
  File "/home/teuthworker/src/teuthology_firefly/teuthology/parallel.py", line 19, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/teuthology_firefly/teuthology/task/parallel.py", line 50, in _run_spawned
    mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/teuthology_firefly/teuthology/run_tasks.py", line 31, in run_one_task
    return fn(**kwargs)
  File "/home/teuthworker/src/teuthology_firefly/teuthology/task/sequential.py", line 48, in task
    mgr.__enter__()
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/teuthworker/src/teuthology_firefly/teuthology/task/ceph.py", line 1239, in restart
    ctx.daemons.get_daemon(type_, id_).stop()
  File "/home/teuthworker/src/teuthology_firefly/teuthology/task/ceph.py", line 57, in stop
    run.wait([self.proc])
  File "/home/teuthworker/src/teuthology_firefly/teuthology/orchestra/run.py", line 356, in wait
    proc.exitstatus.get()
  File "/usr/lib/python2.7/dist-packages/gevent/event.py", line 207, in get
    raise self._exception
CommandFailedError: Command failed on 10.214.138.95 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 1'
2014-07-28T22:01:43.515 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
archive_path: /var/lib/teuthworker/archive/teuthology-2014-07-28_11:48:15-upgrade:dumpling-x-firefly---basic-vps/382697
branch: firefly
description: upgrade:dumpling-x/parallel/{0-cluster/start.yaml 1-dumpling-install/cuttlefish-dumpling.yaml
  2-workload/rados_api.yaml 3-upgrade-sequence/upgrade-all.yaml 4-final-upgrade/client.yaml
  5-final-workload/rgw_s3tests.yaml rhel_6.5.yaml}
email: ceph-qa@ceph.com
job_id: '382697'
last_in_suite: false
machine_type: vps
name: teuthology-2014-07-28_11:48:15-upgrade:dumpling-x-firefly---basic-vps
nuke-on-error: true
os_type: rhel
os_version: '6.5'
overrides:
  admin_socket:
    branch: firefly
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        mon warn on legacy crush tunables: false
      osd:
        debug filestore: 20
        debug journal: 20
        debug ms: 1
        debug osd: 20
    log-whitelist:
    - slow request
    - scrub mismatch
    - ScrubResult
    sha1: b576d5a242c16bc9e38ba283a9784f838614882a
  ceph-deploy:
    branch:
      dev: firefly
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
        osd default pool size: 2
  install:
    ceph:
      sha1: b576d5a242c16bc9e38ba283a9784f838614882a
  s3tests:
    branch: firefly
  workunit:
    sha1: b576d5a242c16bc9e38ba283a9784f838614882a
owner: yuriw
priority: 1000
roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
- - mon.b
  - mon.c
  - osd.2
  - osd.3
- - client.0
  - client.1
suite: upgrade:dumpling-x
suite_branch: firefly
targets:
  ubuntu@vpm050.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAp7rxkr0MpCODHX6nw/SqLhwjZB9XY+WUUHd+5rA3FAIHWawW54wyIRO2xzMoL6BfRnBzVEll1cV6jUfHsHJIKAWOS8OSuoWt4AmROwBpSToFlReuW4S8dLyLdQ+EvGhS/YE83VsEHtxtmwfjnfpg+ADOaEzwHODYLHkar+/i/9n8evqmUjy8YGcPBNfNBhJ7c873zYoRiUMhq5VmVnxjiPz0evqp+WhhuLvasrGbtI1hYHtJjVH4DsbCJn7fP6vG/1Yxzk9RK575HYJk0UIuUqxuQGSC0wE+4OeLYcz8Uo7f+jl2/dgvD2HsyUMXlr45iSr3qKNvXgGqGSNj5M4n8Q==
  ubuntu@vpm051.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAxl4J8EQfl0OmYYV0GktZcKwPTbcYaeCdimNdd4kIpCj7HZePrCisFUSGiU1mMHJC+IFRRvml5qCWFpzFSjMJZJp0/KAxhG3ixzJEWZPhxITJhls8yJzJ5ZkoP+KehJMd4teEt3uTV2P3wT2V0PH4H1Y7zyAHEVRoZAKrqAEpMiy8xbDV+TW/ytNxX3j788Nw+7Orf6jwM2ir8YU0cqrTQp9L3KHMMD/qD1YwC1oPwHjq+R2VwgW/h3aJgwuUOw21fu3xdACJ/uLsg7utiJVBqqQ8R6PY2IDOXBjxMDfeAODrQ+hT4QmUnCbJOQexKMR2cmblpoERfaFnZZsgp3ydBQ==
  ubuntu@vpm085.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA3dE8fsW00cQfM0sDOhLaHL5jT9efkLyJdO+qAzg4ocCLEz4Kz95makKn2ind81PIXXD1tDc4M085NE+ksJx9W88iM+zVVjK8i9KpjYcqPxIc8+jXNIHVhYoIZh8hHdVV+O4YXZGRQ3L6vBjElC6ODNo082K4/wLlGfwVa8JGawzYivhUpuaO1OoYKHGOKwl/nuAKGHENxCwiqJKWr0MPPGnPGLtZyNx6yOZ2B7vBl7p/EpjQvxgzg8vs9Vq2F7mVsVJM9GlNqYiw4FIT4WThDmu80pAu++OJ2dU+qODanpCyZVkB8RI11dMDQ6bsz1vxS0VyP7vqvzTeLKdkQDYhHw==
tasks:
- internal.lock_machines:
  - 3
  - vps
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.serialize_remote_roles: null
- internal.check_conflict: null
- internal.check_ceph_data: null
- internal.vm_setup: null
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.sudo: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock.check: null
- install:
    branch: cuttlefish
- print: '**** done cuttlefish install'
- ceph:
    fs: xfs
- print: '**** done ceph'
- install.upgrade:
    all:
      branch: dumpling
- ceph.restart: null
- parallel:
  - workload
  - upgrade-sequence
- print: '**** done parallel'
- install.upgrade:
    client.0: null
- print: '**** done install.upgrade'
- rgw:
  - client.1
- s3tests:
    client.1:
      branch: dumpling
      rgw_server: client.1
teuthology_branch: firefly
tube: vps
upgrade-sequence:
  sequential:
  - install.upgrade:
      mon.a: null
      mon.b: null
  - ceph.restart:
    - mon.a
    - mon.b
    - mon.c
    - mds.a
    - osd.0
    - osd.1
    - osd.2
    - osd.3
verbose: true
worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.5105
workload:
  sequential:
  - workunit:
      branch: dumpling
      clients:
        client.0:
        - rados/test-upgrade-firefly.sh
        - cls
description: upgrade:dumpling-x/parallel/{0-cluster/start.yaml 1-dumpling-install/cuttlefish-dumpling.yaml
  2-workload/rados_api.yaml 3-upgrade-sequence/upgrade-all.yaml 4-final-upgrade/client.yaml
  5-final-workload/rgw_s3tests.yaml rhel_6.5.yaml}
duration: 2374.8802947998047
failure_reason: 'Command failed on 10.214.138.95 with status 1: ''sudo adjust-ulimits
  ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd
  -f -i 1'''
flavor: basic
owner: yuriw
success: false

History

#2 Updated by Sage Weil over 9 years ago

this sounds a bit like a problem we had a while back with hung IOs from the VMs?

#3 Updated by Sage Weil over 9 years ago

  • Status changed from New to Can't reproduce

Also available in: Atom PDF