Actions
Bug #8959
closedosd crashed in upgrade:dumpling-x-firefly---basic-vps suite
Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Coredump from ubuntu@teuthology:/a/teuthology-2014-07-28_11:48:15-upgrade:dumpling-x-firefly---basic-vps/382697/remote/ubuntu@vpm051.front.sepia.ceph.com/log/ceph-osd.1.log.gz
30045620-2014-07-29 00:54:10.530978 7fc31b33c700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7fc31b33c700 time 2014-07-29 00:54:09.453159 30045833:common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") 30045903- 30045904- ceph version 0.67.9-21-g8649cbb (8649cbbc96a4de9de169b0203f35e0ac6c36a2ef) 30045980- 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x12b) [0x82837b] 30046076- 2: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, long, long)+0x90) [0x8289a0] 30046171- 3: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x548) [0x698968] 30046290- 4: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x16) [0x6d3fb6] 30046405- 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0x84da51] 30046472- 6: (ThreadPool::WorkThread::entry()+0x10) [0x850a80] 30046526- 7: (()+0x79d1) [0x7fc3335279d1] 30046559- 8: (clone()+0x6d) [0x7fc331dedb6d] 30046595- NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 30046688-
2014-07-28T22:01:43.345 DEBUG:teuthology.orchestra.run:Running [10.214.138.95]: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mds -f -i a' 2014-07-28T22:01:43.426 INFO:teuthology.task.ceph.mds.a:Started 2014-07-28T22:01:43.426 DEBUG:teuthology.task.ceph.osd.1:waiting for process to exit 2014-07-28T22:01:43.426 ERROR:teuthology.parallel:Exception in parallel execution Traceback (most recent call last): File "/home/teuthworker/src/teuthology_firefly/teuthology/parallel.py", line 82, in __exit__ for result in self: File "/home/teuthworker/src/teuthology_firefly/teuthology/parallel.py", line 101, in next resurrect_traceback(result) File "/home/teuthworker/src/teuthology_firefly/teuthology/parallel.py", line 19, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/teuthology_firefly/teuthology/task/parallel.py", line 50, in _run_spawned mgr = run_tasks.run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/teuthology_firefly/teuthology/run_tasks.py", line 31, in run_one_task return fn(**kwargs) File "/home/teuthworker/src/teuthology_firefly/teuthology/task/sequential.py", line 48, in task mgr.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/home/teuthworker/src/teuthology_firefly/teuthology/task/ceph.py", line 1239, in restart ctx.daemons.get_daemon(type_, id_).stop() File "/home/teuthworker/src/teuthology_firefly/teuthology/task/ceph.py", line 57, in stop run.wait([self.proc]) File "/home/teuthworker/src/teuthology_firefly/teuthology/orchestra/run.py", line 356, in wait proc.exitstatus.get() File "/usr/lib/python2.7/dist-packages/gevent/event.py", line 207, in get raise self._exception CommandFailedError: Command failed on 10.214.138.95 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 1' 2014-07-28T22:01:43.515 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last):
archive_path: /var/lib/teuthworker/archive/teuthology-2014-07-28_11:48:15-upgrade:dumpling-x-firefly---basic-vps/382697 branch: firefly description: upgrade:dumpling-x/parallel/{0-cluster/start.yaml 1-dumpling-install/cuttlefish-dumpling.yaml 2-workload/rados_api.yaml 3-upgrade-sequence/upgrade-all.yaml 4-final-upgrade/client.yaml 5-final-workload/rgw_s3tests.yaml rhel_6.5.yaml} email: ceph-qa@ceph.com job_id: '382697' last_in_suite: false machine_type: vps name: teuthology-2014-07-28_11:48:15-upgrade:dumpling-x-firefly---basic-vps nuke-on-error: true os_type: rhel os_version: '6.5' overrides: admin_socket: branch: firefly ceph: conf: mon: debug mon: 20 debug ms: 1 debug paxos: 20 mon warn on legacy crush tunables: false osd: debug filestore: 20 debug journal: 20 debug ms: 1 debug osd: 20 log-whitelist: - slow request - scrub mismatch - ScrubResult sha1: b576d5a242c16bc9e38ba283a9784f838614882a ceph-deploy: branch: dev: firefly conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 osd default pool size: 2 install: ceph: sha1: b576d5a242c16bc9e38ba283a9784f838614882a s3tests: branch: firefly workunit: sha1: b576d5a242c16bc9e38ba283a9784f838614882a owner: yuriw priority: 1000 roles: - - mon.a - mds.a - osd.0 - osd.1 - - mon.b - mon.c - osd.2 - osd.3 - - client.0 - client.1 suite: upgrade:dumpling-x suite_branch: firefly targets: ubuntu@vpm050.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAp7rxkr0MpCODHX6nw/SqLhwjZB9XY+WUUHd+5rA3FAIHWawW54wyIRO2xzMoL6BfRnBzVEll1cV6jUfHsHJIKAWOS8OSuoWt4AmROwBpSToFlReuW4S8dLyLdQ+EvGhS/YE83VsEHtxtmwfjnfpg+ADOaEzwHODYLHkar+/i/9n8evqmUjy8YGcPBNfNBhJ7c873zYoRiUMhq5VmVnxjiPz0evqp+WhhuLvasrGbtI1hYHtJjVH4DsbCJn7fP6vG/1Yxzk9RK575HYJk0UIuUqxuQGSC0wE+4OeLYcz8Uo7f+jl2/dgvD2HsyUMXlr45iSr3qKNvXgGqGSNj5M4n8Q== ubuntu@vpm051.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAxl4J8EQfl0OmYYV0GktZcKwPTbcYaeCdimNdd4kIpCj7HZePrCisFUSGiU1mMHJC+IFRRvml5qCWFpzFSjMJZJp0/KAxhG3ixzJEWZPhxITJhls8yJzJ5ZkoP+KehJMd4teEt3uTV2P3wT2V0PH4H1Y7zyAHEVRoZAKrqAEpMiy8xbDV+TW/ytNxX3j788Nw+7Orf6jwM2ir8YU0cqrTQp9L3KHMMD/qD1YwC1oPwHjq+R2VwgW/h3aJgwuUOw21fu3xdACJ/uLsg7utiJVBqqQ8R6PY2IDOXBjxMDfeAODrQ+hT4QmUnCbJOQexKMR2cmblpoERfaFnZZsgp3ydBQ== ubuntu@vpm085.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA3dE8fsW00cQfM0sDOhLaHL5jT9efkLyJdO+qAzg4ocCLEz4Kz95makKn2ind81PIXXD1tDc4M085NE+ksJx9W88iM+zVVjK8i9KpjYcqPxIc8+jXNIHVhYoIZh8hHdVV+O4YXZGRQ3L6vBjElC6ODNo082K4/wLlGfwVa8JGawzYivhUpuaO1OoYKHGOKwl/nuAKGHENxCwiqJKWr0MPPGnPGLtZyNx6yOZ2B7vBl7p/EpjQvxgzg8vs9Vq2F7mVsVJM9GlNqYiw4FIT4WThDmu80pAu++OJ2dU+qODanpCyZVkB8RI11dMDQ6bsz1vxS0VyP7vqvzTeLKdkQDYhHw== tasks: - internal.lock_machines: - 3 - vps - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.serialize_remote_roles: null - internal.check_conflict: null - internal.check_ceph_data: null - internal.vm_setup: null - internal.base: null - internal.archive: null - internal.coredump: null - internal.sudo: null - internal.syslog: null - internal.timer: null - chef: null - clock.check: null - install: branch: cuttlefish - print: '**** done cuttlefish install' - ceph: fs: xfs - print: '**** done ceph' - install.upgrade: all: branch: dumpling - ceph.restart: null - parallel: - workload - upgrade-sequence - print: '**** done parallel' - install.upgrade: client.0: null - print: '**** done install.upgrade' - rgw: - client.1 - s3tests: client.1: branch: dumpling rgw_server: client.1 teuthology_branch: firefly tube: vps upgrade-sequence: sequential: - install.upgrade: mon.a: null mon.b: null - ceph.restart: - mon.a - mon.b - mon.c - mds.a - osd.0 - osd.1 - osd.2 - osd.3 verbose: true worker_log: /var/lib/teuthworker/archive/worker_logs/worker.vps.5105 workload: sequential: - workunit: branch: dumpling clients: client.0: - rados/test-upgrade-firefly.sh - cls
description: upgrade:dumpling-x/parallel/{0-cluster/start.yaml 1-dumpling-install/cuttlefish-dumpling.yaml 2-workload/rados_api.yaml 3-upgrade-sequence/upgrade-all.yaml 4-final-upgrade/client.yaml 5-final-workload/rgw_s3tests.yaml rhel_6.5.yaml} duration: 2374.8802947998047 failure_reason: 'Command failed on 10.214.138.95 with status 1: ''sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 1''' flavor: basic owner: yuriw success: false
Updated by Yuri Weinstein almost 10 years ago
Seems the same crash in another tests, logs are in http://qa-proxy.ceph.com/teuthology/teuthology-2014-07-28_11:48:15-upgrade:dumpling-x-firefly---basic-vps/382725/
Updated by Sage Weil almost 10 years ago
this sounds a bit like a problem we had a while back with hung IOs from the VMs?
Updated by Sage Weil over 9 years ago
- Status changed from New to Can't reproduce
Actions