Project

General

Profile

Actions

Bug #10550

closed

Assertion in ceph-mds on failures in ::init

Added by John Spray over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

init() can call suicide in various error handling paths before it starts progress thread, but in suicide() it tries to stop progress_thread and asserts if the thread wasn't started.

2015-01-15 11:36:56.258493 7f214ce6c800 -1 mds.-1.-1 *** no OSDs are up as of epoch 1, waiting
2015-01-15 11:37:01.181026 7f214925e700  1 heartbeat_map is_healthy 'MDS' had timed out after 15
2015-01-15 11:37:06.181258 7f214925e700  1 heartbeat_map is_healthy 'MDS' had timed out after 15
2015-01-15 11:37:06.258726 7f214ce6c800 -1 mds.-1.-1 *** no OSDs are up as of epoch 1, waiting
2015-01-15 11:37:11.181460 7f214925e700  1 heartbeat_map is_healthy 'MDS' had timed out after 15
2015-01-15 11:37:16.181686 7f214925e700  1 heartbeat_map is_healthy 'MDS' had timed out after 15
2015-01-15 11:37:16.258958 7f214ce6c800 -1 mds.-1.-1 *** no OSDs are up as of epoch 1, waiting
2015-01-15 11:37:21.181892 7f214925e700  1 heartbeat_map is_healthy 'MDS' had timed out after 15
2015-01-15 11:37:23.078404 7f214725a700  1 heartbeat_map reset_timeout 'MDS' had timed out after 15
2015-01-15 11:37:26.259204 7f214ce6c800 -1 mds.-1.-1 *** no OSDs are up as of epoch 3, waiting
2015-01-15 13:14:23.361861 7f2142950700 -1 mds.-1.-1 *** got signal Terminated ***
2015-01-15 13:14:23.361901 7f2142950700  1 mds.-1.-1 suicide.  wanted down:dne, now down:dne
2015-01-15 13:14:23.385996 7f48f424d800  0 ceph version 0.91-348-gd4a6447 (d4a64474e53ce7c9472feac530ca94ccf616fbcc), process ceph-mds, pid 6717
2015-01-15 13:14:23.388258 7f48f424d800 -1 mds.-1.0 log_to_monitors {default=true}
2015-01-15 13:14:23.390817 7f48f424d800 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted
2015-01-15 13:14:23.390839 7f48f424d800  1 mds.-1.0 suicide.  wanted down:dne, now up:boot
2015-01-15 13:14:23.392899 7f48f424d800 -1 common/Thread.cc: In function 'int Thread::join(void**)' thread 7f48f424d800 time 2015-01-15 13:14:23.391350
common/Thread.cc: 136: FAILED assert("join on thread that was never started" == 0)

 ceph version 0.91-348-gd4a6447 (d4a64474e53ce7c9472feac530ca94ccf616fbcc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x94427b]
 2: (Thread::join(void**)+0x41) [0x9308a1]
 3: (MDS::ProgressThread::shutdown()+0x74) [0x59bb64]
 4: (MDS::suicide()+0x117) [0x59bc97]
 5: (MDS::init(MDSMap::DaemonState)+0x2013) [0x5acc03]
 6: (main()+0x94d) [0x59038d]
 7: (__libc_start_main()+0xf5) [0x7f48f2251ec5]
 8: /usr/bin/ceph-mds() [0x594d87]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

(Copied from http://pastebin.com/dh8kKcfh from CoilDomain in #ceph)


Files

20150121_10550_join_fix_v1.patch (515 Bytes) 20150121_10550_join_fix_v1.patch Radoslaw Zarzynski, 01/22/2015 09:42 AM
Actions #1

Updated by Radoslaw Zarzynski about 9 years ago

I am working on this.

Actions #2

Updated by Radoslaw Zarzynski about 9 years ago

Proposal of the patch attached.

Actions #3

Updated by John Spray about 9 years ago

Thanks for the patch. Please could you resend that as either a github pull request or if that is not possible, a git-formatted patch that I can apply with "git am"?

Please also update the commit message to include a valid "Signed-off-by" line (https://github.com/ceph/ceph#contributing-code)

Actions #4

Updated by Radoslaw Zarzynski about 9 years ago

I made a pull reuqest via github. Commit log has been improved as well. Direct link: https://github.com/ceph/ceph/pull/3514

Actions #5

Updated by Greg Farnum about 9 years ago

  • Status changed from New to Resolved

Merged to master.

Actions

Also available in: Atom PDF