Actions
Bug #1620
closedrgw suicide due to heartbeat timeout
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Happens around a hour after osd went down:
2011-10-14 11:44:03.714491 7fd2d976a700 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7fd2cdffb700' had timed out after 600 2011-10-14 11:44:03.714499 7fd2d976a700 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7fd2cdffb700' had suicide timed out after 3600 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)', in thread '0x7fd2d976a700' common/HeartbeatMap.cc: 78: FAILED assert(0 == "hit suicide timeout") ceph version 0.35-389-g52bad62 (commit:52bad62d03a4b30d3c8ebf1426f008cdca287e6f) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x1fe) [0x4c67be] 2: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x4c6aef] 3: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x4c6d20] 4: (CephContextServiceThread::entry()+0x5f) [0x4b0d7f] 5: (()+0x68ba) [0x7fd2dd4b98ba] 6: (clone()+0x6d) [0x7fd2dc1ad02d] ceph version 0.35-389-g52bad62 (commit:52bad62d03a4b30d3c8ebf1426f008cdca287e6f) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x1fe) [0x4c67be] 2: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x4c6aef] 3: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x4c6d20] 4: (CephContextServiceThread::entry()+0x5f) [0x4b0d7f] 5: (()+0x68ba) [0x7fd2dd4b98ba] 6: (clone()+0x6d) [0x7fd2dc1ad02d] *** Caught signal (Aborted) ** in thread 0x7fd2d976a700 ceph version 0.35-389-g52bad62 (commit:52bad62d03a4b30d3c8ebf1426f008cdca287e6f) 1: /usr/bin/radosgw() [0x48cf59] 2: (()+0xef60) [0x7fd2dd4c1f60] 3: (gsignal()+0x35) [0x7fd2dc110165] 4: (abort()+0x180) [0x7fd2dc112f70] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fd2dc9a3dc5] 6: (()+0xcb166) [0x7fd2dc9a2166] 7: (()+0xcb193) [0x7fd2dc9a2193] 8: (()+0xcb28e) [0x7fd2dc9a228e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x3a7) [0x49b287] 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x1fe) [0x4c67be] 11: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x4c6aef] 12: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x4c6d20] 13: (CephContextServiceThread::entry()+0x5f) [0x4b0d7f] 14: (()+0x68ba) [0x7fd2dd4b98ba] 15: (clone()+0x6d) [0x7fd2dc1ad02d]
Updated by Yehuda Sadeh over 12 years ago
- Status changed from New to Resolved
Fixed, 298dbbe64f8b0738ec58db43782813d0686717c7. Basically a 0 value for the rgw suicide timeout should do the work.
Actions