Project

General

Profile

Actions

Bug #1620

closed

rgw suicide due to heartbeat timeout

Added by Yehuda Sadeh over 12 years ago. Updated over 12 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Happens around a hour after osd went down:

2011-10-14 11:44:03.714491 7fd2d976a700 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7fd2cdffb700' had timed out after 600
2011-10-14 11:44:03.714499 7fd2d976a700 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7fd2cdffb700' had suicide timed out after 3600
common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)', in thread '0x7fd2d976a700'
common/HeartbeatMap.cc: 78: FAILED assert(0 == "hit suicide timeout")
 ceph version 0.35-389-g52bad62 (commit:52bad62d03a4b30d3c8ebf1426f008cdca287e6f)
 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x1fe) [0x4c67be]
 2: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x4c6aef]
 3: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x4c6d20]
 4: (CephContextServiceThread::entry()+0x5f) [0x4b0d7f]
 5: (()+0x68ba) [0x7fd2dd4b98ba]
 6: (clone()+0x6d) [0x7fd2dc1ad02d]
 ceph version 0.35-389-g52bad62 (commit:52bad62d03a4b30d3c8ebf1426f008cdca287e6f)
 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x1fe) [0x4c67be]
 2: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x4c6aef]
 3: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x4c6d20]
 4: (CephContextServiceThread::entry()+0x5f) [0x4b0d7f]
 5: (()+0x68ba) [0x7fd2dd4b98ba]
 6: (clone()+0x6d) [0x7fd2dc1ad02d]
*** Caught signal (Aborted) **
 in thread 0x7fd2d976a700
 ceph version 0.35-389-g52bad62 (commit:52bad62d03a4b30d3c8ebf1426f008cdca287e6f)
 1: /usr/bin/radosgw() [0x48cf59]
 2: (()+0xef60) [0x7fd2dd4c1f60]
 3: (gsignal()+0x35) [0x7fd2dc110165]
 4: (abort()+0x180) [0x7fd2dc112f70]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fd2dc9a3dc5]
 6: (()+0xcb166) [0x7fd2dc9a2166]
 7: (()+0xcb193) [0x7fd2dc9a2193]
 8: (()+0xcb28e) [0x7fd2dc9a228e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x3a7) [0x49b287]
 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x1fe) [0x4c67be]
 11: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x4c6aef]
 12: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x4c6d20]
 13: (CephContextServiceThread::entry()+0x5f) [0x4b0d7f]
 14: (()+0x68ba) [0x7fd2dd4b98ba]
 15: (clone()+0x6d) [0x7fd2dc1ad02d]
Actions #1

Updated by Yehuda Sadeh over 12 years ago

  • Status changed from New to Resolved

Fixed, 298dbbe64f8b0738ec58db43782813d0686717c7. Basically a 0 value for the rgw suicide timeout should do the work.

Actions

Also available in: Atom PDF