Project

General

Profile

Actions

Bug #36172

open

osd: hit suicide timeout

Added by Bernd Hennig over 5 years ago. Updated over 5 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-disk
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version 0.94.9-9.el7cp

A osd-drive died some days agoo and after a restart today again with the same error:

= osd.115 10.24.53.152:6807/15666 566 ==== osd_ping(you_died e15168 stamp 2018-09-25 08:55:11.163988) v2 ==== 47+0+0 (923195720 0 0) 0x2ce00f600 con 0x2b95a42c0
-2> 2018-09-25 08:55:12.980462 7fc88a684700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fc7e4492700' had timed out after 15
-1> 2018-09-25 08:55:12.980471 7fc88a684700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fc7e4492700' had suicide timed out after 150
0> 2018-09-25 08:55:12.981683 7fc88a684700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7fc88a684700 time 2018-09-25 08:55:12.980485
common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")

ceph version 0.94.9-9.el7cp (b83334e01379f267fb2f9ce729d74a0a8fa1e92c)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xb12ef5]
2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2d9) [0xa46399]
3: (ceph::HeartbeatMap::is_healthy()+0xde) [0xa46c8e]
4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0xa473ac]
5: (CephContextServiceThread::entry()+0x15b) [0xb2333b]
6: (()+0x7dc5) [0x7fc88de8ddc5]
7: (clone()+0x6d) [0x7fc88c97073d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 keyvaluestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.18.log
--
end dump of recent events ---
2018-09-25 08:55:13.015129 7fc88a684700 -1 ** Caught signal (Aborted) *
in thread 7fc88a684700

ceph version 0.94.9-9.el7cp (b83334e01379f267fb2f9ce729d74a0a8fa1e92c)
1: /usr/bin/ceph-osd() [0xa0f922]
2: (()+0xf370) [0x7fc88de95370]
3: (gsignal()+0x37) [0x7fc88c8ae1d7]
4: (abort()+0x148) [0x7fc88c8af8c8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x165) [0x7fc88d1b2ab5]
6: (()+0x5ea26) [0x7fc88d1b0a26]
7: (()+0x5ea53) [0x7fc88d1b0a53]
8: (()+0x5ec73) [0x7fc88d1b0c73]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xb130ea]
10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2d9) [0xa46399]
11: (ceph::HeartbeatMap::is_healthy()+0xde) [0xa46c8e]
12: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0xa473ac]
13: (CephContextServiceThread::entry()+0x15b) [0xb2333b]
14: (()+0x7dc5) [0x7fc88de8ddc5]
15: (clone()+0x6d) [0x7fc88c97073d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
0> 2018-09-25 08:55:13.015129 7fc88a684700 -1 ** Caught signal (Aborted) *
in thread 7fc88a684700

ceph version 0.94.9-9.el7cp (b83334e01379f267fb2f9ce729d74a0a8fa1e92c)
1: /usr/bin/ceph-osd() [0xa0f922]
2: (()+0xf370) [0x7fc88de95370]
3: (gsignal()+0x37) [0x7fc88c8ae1d7]
4: (abort()+0x148) [0x7fc88c8af8c8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x165) [0x7fc88d1b2ab5]
6: (()+0x5ea26) [0x7fc88d1b0a26]
7: (()+0x5ea53) [0x7fc88d1b0a53]
8: (()+0x5ec73) [0x7fc88d1b0c73]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xb130ea]
10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2d9) [0xa46399]
11: (ceph::HeartbeatMap::is_healthy()+0xde) [0xa46c8e]
12: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0xa473ac]
13: (CephContextServiceThread::entry()+0x15b) [0xb2333b]
14: (()+0x7dc5) [0x7fc88de8ddc5]
15: (clone()+0x6d) [0x7fc88c97073d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 keyvaluestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.18.log

Actions

Also available in: Atom PDF