Project

General

Profile

Bug #1624

osd crash in HearbeatMap::_check

Added by Josh Durgin almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Logs with debugging are in vit:~joshd/thrash_stuck_active4. This happened on osds 0 and 4:

 ceph version 0.36-324-ge6dbd71 (commit:e6dbd7141bd8b4403f3b931f3a0870ed9e725096)
 1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x57b754]
 2: (()+0xfb40) [0x7fd23d006b40]
 3: (gsignal()+0x35) [0x7fd23b7dbba5]
 4: (abort()+0x180) [0x7fd23b7df6b0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd23c07f6bd]
 6: (()+0xb9906) [0x7fd23c07d906]
 7: (()+0xb9933) [0x7fd23c07d933]
 8: (()+0xb9a3e) [0x7fd23c07da3e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x58428f]
 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x1fe) [0x58317e]
 11: (ceph::HeartbeatMap::is_healthy()+0x7f) [0x5834af]
 12: (ceph::HeartbeatMap::check_touch_file()+0x20) [0x5836e0]
 13: (CephContextServiceThread::entry()+0x5f) [0x5cb2bf]
 14: (()+0x7971) [0x7fd23cffe971]
 15: (clone()+0x6d) [0x7fd23b88e92d]


Related issues

Duplicated by Ceph - Bug #1635: osd hit suicide timeout in heartbeat_map thread Duplicate 10/19/2011

History

#1 Updated by Sage Weil almost 9 years ago

argh, the tarball is already gone:

  1. wget http://ceph.newdream.net/gitbuilder/output/sha1/e6dbd7141bd8b4403f3b931f3a0870ed9e725096/ceph.x86_64.tgz
    --2011-10-18 16:27:27-- http://ceph.newdream.net/gitbuilder/output/sha1/e6dbd7141bd8b4403f3b931f3a0870ed9e725096/ceph.x86_64.tgz
    Resolving ceph.newdream.net... 66.33.208.28
    Connecting to ceph.newdream.net|66.33.208.28|:80... connected.
    HTTP request sent, awaiting response... 404 Not Found
    2011-10-18 16:27:27 ERROR 404: Not Found.

#5 Updated by Sage Weil almost 9 years ago

running this in a loop with logs to try ot catch it

#6 Updated by Sage Weil almost 9 years ago

2b3bdea9f7bcf9e9f8d4328f62d82ff43e996b3a fixes at least some of these....

#7 Updated by Sage Weil almost 9 years ago

  • Status changed from New to Need More Info
  • Assignee set to Sage Weil

#8 Updated by Sage Weil almost 9 years ago

  • Status changed from Need More Info to Resolved

going to chalk these up to the infinite loop fixed in that previous patch.

Also available in: Atom PDF