Project

General

Profile

Actions

Bug #1635

closed

osd hit suicide timeout in heartbeat_map thread

Added by Josh Durgin over 12 years ago. Updated over 12 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This was while thrashing with radosbench, during peering, with osds 3 and 6 marked out.
From teuthology:~t/log/osd.0.log.gz:

2011-10-19 13:13:53.070817 7ff4bf2a7700 heartbeat_map is_healthy 'OSD::op_tp thread 0x7ff4b208b700' had suicide timed out after 300
common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)', in thread '0x7ff4bf2a7700'
common/HeartbeatMap.cc: 78: FAILED assert(0 == "hit suicide timeout")
 ceph version 0.36-327-g3e92aac (commit:3e92aace21ecc766f14ac5a2c6377570988f1a3b)
 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x3ad) [0x7808cd]
 2: (ceph::HeartbeatMap::is_healthy()+0x8f) [0x78202f]
 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0x782438]
 4: (CephContextServiceThread::entry()+0x77) [0x672d57]
 5: (Thread::_entry_func(void*)+0x12) [0x615372]
 6: (()+0x7971) [0x7ff4c0d17971]
 7: (clone()+0x6d) [0x7ff4bf5a792d]

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #1624: osd crash in HearbeatMap::_checkResolvedSage Weil10/18/2011

Actions
Actions #1

Updated by Sage Weil over 12 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF