Project

General

Profile

Actions

Bug #2784

closed

osd hit suicide timeout

Added by Tamilarasi muthamizhan almost 12 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Log: ubuntu@teuthology:/a/teuthology-2012-07-12_19:00:15-regression-master-testing-gcov/10615

ubuntu@teuthology:/a/teuthology-2012-07-12_19:00:15-regression-master-testing-gcov/10615$ zcat /log/osd.2.log.gz

-1> 2012-07-12 20:09:15.281759 7fc217006700  1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7fc20e7f5700' had suicide timed out after 180
0> 2012-07-12 20:09:15.282794 7fc217006700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7fc217006700 time 2012-07-12 20:09:15.281772
common/HeartbeatMap.cc: 78: FAILED assert(0 == "hit suicide timeout")
ceph version 0.48argonaut-358-gbcfa573 (commit:bcfa573f5f615f3403ff71da0212cd1cee7e7d9c)
1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x433) [0x9d3ae3]
2: (ceph::HeartbeatMap::is_healthy()+0x8f) [0x9d48cf]
3: (ceph::HeartbeatMap::check_touch_file()+0x2b) [0x9d4cbb]
4: (CephContextServiceThread::entry()+0x6d) [0x92461d]
5: (Thread::_entry_func(void*)+0x12) [0x8f95b2]
6: (()+0x7e9a) [0x7fc21955ee9a]
7: (clone()+0x6d) [0x7fc217b134bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- end dump of recent events ---
2012-07-12 20:09:15.285177 7fc217006700 -1 ** Caught signal (Aborted) *
in thread 7fc217006700

ceph version 0.48argonaut-358-gbcfa573 (commit:bcfa573f5f615f3403ff71da0212cd1cee7e7d9c)
1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x80eb9a]
2: (()+0xfcb0) [0x7fc219566cb0]
3: (gsignal()+0x35) [0x7fc217a57445]
4: (abort()+0x17b) [0x7fc217a5abab]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7fc2183a569d]
6: (()+0xb5846) [0x7fc2183a3846]
7: (()+0xb5873) [0x7fc2183a3873]
8: (()+0xb596e) [0x7fc2183a396e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x385) [0x912275]
10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x433) [0x9d3ae3]
11: (ceph::HeartbeatMap::is_healthy()+0x8f) [0x9d48cf]
12: (ceph::HeartbeatMap::check_touch_file()+0x2b) [0x9d4cbb]
13: (CephContextServiceThread::entry()+0x6d) [0x92461d]
14: (Thread::_entry_func(void*)+0x12) [0x8f95b2]
15: (()+0x7e9a) [0x7fc21955ee9a]
16: (clone()+0x6d) [0x7fc217b134bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
0> 2012-07-12 20:09:15.285177 7fc217006700 -1 ** Caught signal (Aborted) *
in thread 7fc217006700

ceph version 0.48argonaut-358-gbcfa573 (commit:bcfa573f5f615f3403ff71da0212cd1cee7e7d9c)
1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x80eb9a]
2: (()+0xfcb0) [0x7fc219566cb0]
3: (gsignal()+0x35) [0x7fc217a57445]
4: (abort()+0x17b) [0x7fc217a5abab]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7fc2183a569d]
6: (()+0xb5846) [0x7fc2183a3846]
7: (()+0xb5873) [0x7fc2183a3873]
8: (()+0xb596e) [0x7fc2183a396e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x385) [0x912275]
10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x433) [0x9d3ae3]
11: (ceph::HeartbeatMap::is_healthy()+0x8f) [0x9d48cf]
12: (ceph::HeartbeatMap::check_touch_file()+0x2b) [0x9d4cbb]
13: (CephContextServiceThread::entry()+0x6d) [0x92461d]
14: (Thread::_entry_func(void*)+0x12) [0x8f95b2]
15: (()+0x7e9a) [0x7fc21955ee9a]
16: (clone()+0x6d) [0x7fc217b134bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- end dump of recent events ---

ubuntu@teuthology:/a/teuthology-2012-07-12_19:00:15-regression-master-testing-gcov/10615$ cat config.yaml
kernel: &id001
kdb: true
sha1: ea18acf27e2f7cee4ac9d01719564414d2cd64b5
nuke-on-error: true
overrides:
ceph:
coverage: true
fs: btrfs
log-whitelist:
- slow request
sha1: bcfa573f5f615f3403ff71da0212cd1cee7e7d9c
roles:
- - mon.a
- osd.0
- osd.1
- osd.2
- - mds.a
- osd.3
- osd.4
- osd.5
- - client.0
targets:
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDx6LtrVRWV3GjiTYgw3lIpuljK4ObcgjFcitU2ZIkJZzBK3DokJ5AvTNhbYOWpo0bJqaiheFM82UEiiqCs6ChgRbSbd++RSfVT9PejPpmPipLB8Bj24xzdrdCqUaQoMNr6J5+h7xcWiCeW/8sDBJIyWVOSO0AGGQnc88HwVExSIuzsfM9ergnQaQcmrCqf6PTrpVZWeBQPqOnWAy+fCka/vD+omclH9cYyLeK/tVTIYHWBns4nLzL0FeQFe8e3uyoCtFzHvOC4ziIKVRv/WpHt+fZq8M1IGUNaQAR1v3x/eKF0ut2EUYTCfR03MAGEjhv0IMX6XloaNjCFeMXsrHh7
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDo+Kh24vRxeTQ6/n5PIIGuxrPHPRO/xMQlwoLHi7mR01cIXJMG5wet7mp2om3/5SZSDcLBHduDKrdWL142Sg5fC0zZPUggbxS7nz/UCjYBzMsOtHEUAU5Gs0KFopOCHXNEveK95ezsroMAD5+jS/IEpiooYCkrR3H+NSvUU0Ae352PlXqV0vamkYzyQyEMmhFE50ALhUXbKMve3d2mxJee5sqVZSBmQTbze9RKUA96t9iiwiheflXbN1i9WHlbBOIue5pZ5fM3/vqPWgaShfFpa0pT56QKJfjyFcDeCLOislo23E5qKAJOi5vn5BoYVtG3niNQpt/YbYGfDEHVeqt9
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEwyNlwC9Utqf3PCjL2JR4wwDkzpdEJuW93DOW82vYVisYEGod454JwXeNkjqzTUk6tXeRoUM9f/C6sZS3LFgHcMYt6m0sxP8DC4qU+q0YxCw9zLY8bXKe4DDjijM62h/SnyqyOWIh9amGT7wRwZEHBV1BKvZbNxQIJ7ESkuKsk/tJfWKhq7dSw6E/+MZ4yQtXvTyaJ3pK96Hq2uoUkawv+FxXBrzG3FtTTYA8gqA1SIiV3erEIQuBK/WD74i5yK4rwpfGTo7jNc0V6wrwO1BKFj/OGjSC+2LSAkBgf8WLe6UL/dHr3bBEyzm0V4xMf5Iqb8JGvkaXNEfbFqzKC2Wv
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph:
log-whitelist:
- wrongly marked me down
- objects unfound and apparently lost
- thrashosds:
timeout: 1200
- rados:
clients:
- client.0
objects: 50
op_weights:
delete: 50
read: 100
snap_create: 50
snap_remove: 50
snap_rollback: 50
write: 100
ops: 4000
ubuntu@teuthology:/a/teuthology-2012-07-12_19:00:15-regression-master-testing-gcov/10615$ cat summary.yaml
ceph-sha1: bcfa573f5f615f3403ff71da0212cd1cee7e7d9c
description: collection:thrash clusters:6-osd-3-machine.yaml fs:btrfs.yaml thrashers:default.yaml
workloads:snaps-few-objects.yaml
duration: 2186.8045082092285
failure_reason: 'Command failed with status 1: ''/tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage
/tmp/cephtest/archive/coverage /tmp/cephtest/daemon-helper term /tmp/cephtest/binary/usr/local/bin/ceph-osd
-f -i 2 -c /tmp/cephtest/ceph.conf'''
flavor: gcov
owner: scheduled_teuthology@teuthology
success: false


Files

ceph-osd.69.log.gz (1.04 MB) ceph-osd.69.log.gz Joao Eduardo Luis, 12/18/2012 09:46 AM
Actions

Also available in: Atom PDF