Bug #3378
closedcommon/HeartbeatMap.cc: 78: FAILED assert(0 == "hit suicide timeout")
0%
Description
This is a cluster of 2 OSDs that is generally unhappy with life. After deleting the cephfs pools the new pool creation took a long time during which system load was fairly low on both OSDs. After trying for a while OSD 0 dies with FAILED assert(0 == "hit suicide timeout"). Eventually pool creation succeeded.
This might be a duplicate of #2784
Files
Updated by Mark Nelson over 11 years ago
- File ceph.conf ceph.conf added
- File osd.4.log.1.gz osd.4.log.1.gz added
Saw this show up during parametric sweep testing on EXT4 with 8 concurrent OSD disk threads. Ceph build is from gitbuilder next branch for bobtail release: 0.55.1-344-g5f25f9f-1precise. Have attached the compressed OSD log and ceph.conf file.
Updated by Sage Weil over 11 years ago
- Status changed from New to Can't reproduce
The suicide timeout is the symptom only. Usually it means the thread is blocked by a hung syscall. In your case, Matthew, it looks like it was mostly making progress but super slow. Either way, higher filestore logs are needed to see what syscall is blocked, or a core file, or something similar.