Project

General

Profile

Actions

Bug #4357

closed

osd: FAILED assert("join on thread that was never started" == 0)

Added by Wido den Hollander about 11 years ago. Updated almost 11 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I found #1650 which seems related, but rather old and a different use-case.

I got a message from my monitoring system that the health of a small cluster was not ok. Turns out, all 12 OSDs went down with the same backtrace/message:

    -3> 2013-03-05 19:04:18.679948 7f5f37300780 10 -- [2a00:f10:113:0:d585:1138:64c6:be36]:6806/8564 wait: dispatch queue is stopped
    -2> 2013-03-05 19:04:18.679971 7f5f37300780 20 -- [2a00:f10:113:0:d585:1138:64c6:be36]:6806/8564 wait: stopping accepter thread
    -1> 2013-03-05 19:04:18.679984 7f5f37300780 10 accepter.stop accepter
     0> 2013-03-05 19:04:18.683892 7f5f37300780 -1 common/Thread.cc: In function 'int Thread::join(void**)' thread 7f5f37300780 time 2013-03-05 19:04:18.679999
common/Thread.cc: 117: FAILED assert("join on thread that was never started" == 0)

 ceph version 0.56.3-19-g8c6f522 (8c6f52215240f48b5e4d5bb99a5f2f451e7ce70a)
 1: (Thread::join(void**)+0x41) [0x823ee1]
 2: (Accepter::stop()+0x7b) [0x8af5fb]
 3: (SimpleMessenger::wait()+0xa4a) [0x81d6ba]
 4: (main()+0x2282) [0x5733f2]
 5: (__libc_start_main()+0xed) [0x7f5f34f9c76d]
 6: /usr/bin/ceph-osd() [0x575909]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I attached the logs of two OSDs, but I want to mention again that ALL 12 OSDs went down with the same backtrace in about 2 minutes time. Rendering the cluster unable to do any I/O.


Files

ceph-osd.0.log.1.gz (2.12 MB) ceph-osd.0.log.1.gz Wido den Hollander, 03/06/2013 12:44 AM
ceph-osd.4.log.1.gz (2.07 MB) ceph-osd.4.log.1.gz Wido den Hollander, 03/06/2013 12:44 AM
ceph-osd.10.log.gz (244 KB) ceph-osd.10.log.gz osd.10 logs 02-04-2013 Wido den Hollander, 04/02/2013 08:47 AM
Actions

Also available in: Atom PDF