Bug #4357: osd: FAILED assert("join on thread that was never started" == 0) - Ceph - Ceph

Actions

Copy link

Bug #4357

closed

osd: FAILED assert("join on thread that was never started" == 0)

Added by Wido den Hollander about 11 years ago. Updated almost 11 years ago.

Status:

Can't reproduce

Priority:

High

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I found #1650 which seems related, but rather old and a different use-case.

I got a message from my monitoring system that the health of a small cluster was not ok. Turns out, all 12 OSDs went down with the same backtrace/message:

    -3> 2013-03-05 19:04:18.679948 7f5f37300780 10 -- [2a00:f10:113:0:d585:1138:64c6:be36]:6806/8564 wait: dispatch queue is stopped
    -2> 2013-03-05 19:04:18.679971 7f5f37300780 20 -- [2a00:f10:113:0:d585:1138:64c6:be36]:6806/8564 wait: stopping accepter thread
    -1> 2013-03-05 19:04:18.679984 7f5f37300780 10 accepter.stop accepter
     0> 2013-03-05 19:04:18.683892 7f5f37300780 -1 common/Thread.cc: In function 'int Thread::join(void**)' thread 7f5f37300780 time 2013-03-05 19:04:18.679999
common/Thread.cc: 117: FAILED assert("join on thread that was never started" == 0)

 ceph version 0.56.3-19-g8c6f522 (8c6f52215240f48b5e4d5bb99a5f2f451e7ce70a)
 1: (Thread::join(void**)+0x41) [0x823ee1]
 2: (Accepter::stop()+0x7b) [0x8af5fb]
 3: (SimpleMessenger::wait()+0xa4a) [0x81d6ba]
 4: (main()+0x2282) [0x5733f2]
 5: (__libc_start_main()+0xed) [0x7f5f34f9c76d]
 6: /usr/bin/ceph-osd() [0x575909]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I attached the logs of two OSDs, but I want to mention again that ALL 12 OSDs went down with the same backtrace in about 2 minutes time. Rendering the cluster unable to do any I/O.

Files

Download all files

ceph-osd.0.log.1.gz (2.12 MB) ceph-osd.0.log.1.gz		Wido den Hollander, 03/06/2013 12:44 AM
ceph-osd.4.log.1.gz (2.07 MB) ceph-osd.4.log.1.gz		Wido den Hollander, 03/06/2013 12:44 AM
ceph-osd.10.log.gz (244 KB) ceph-osd.10.log.gz	osd.10 logs 02-04-2013	Wido den Hollander, 04/02/2013 08:47 AM

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #4357

osd: FAILED assert("join on thread that was never started" == 0)

Updated by Wido den Hollander about 11 years ago

Updated by Ian Colle about 11 years ago

Updated by Greg Farnum about 11 years ago

Updated by Wido den Hollander about 11 years ago

Updated by Sage Weil about 11 years ago

Updated by Sage Weil about 11 years ago

Updated by Wido den Hollander about 11 years ago

Updated by Wido den Hollander about 11 years ago

Updated by Wido den Hollander about 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Wido den Hollander almost 11 years ago

Updated by Sage Weil almost 11 years ago

Updated by Wido den Hollander almost 11 years ago