Actions
Bug #10550
closedAssertion in ceph-mds on failures in ::init
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
init() can call suicide in various error handling paths before it starts progress thread, but in suicide() it tries to stop progress_thread and asserts if the thread wasn't started.
2015-01-15 11:36:56.258493 7f214ce6c800 -1 mds.-1.-1 *** no OSDs are up as of epoch 1, waiting 2015-01-15 11:37:01.181026 7f214925e700 1 heartbeat_map is_healthy 'MDS' had timed out after 15 2015-01-15 11:37:06.181258 7f214925e700 1 heartbeat_map is_healthy 'MDS' had timed out after 15 2015-01-15 11:37:06.258726 7f214ce6c800 -1 mds.-1.-1 *** no OSDs are up as of epoch 1, waiting 2015-01-15 11:37:11.181460 7f214925e700 1 heartbeat_map is_healthy 'MDS' had timed out after 15 2015-01-15 11:37:16.181686 7f214925e700 1 heartbeat_map is_healthy 'MDS' had timed out after 15 2015-01-15 11:37:16.258958 7f214ce6c800 -1 mds.-1.-1 *** no OSDs are up as of epoch 1, waiting 2015-01-15 11:37:21.181892 7f214925e700 1 heartbeat_map is_healthy 'MDS' had timed out after 15 2015-01-15 11:37:23.078404 7f214725a700 1 heartbeat_map reset_timeout 'MDS' had timed out after 15 2015-01-15 11:37:26.259204 7f214ce6c800 -1 mds.-1.-1 *** no OSDs are up as of epoch 3, waiting 2015-01-15 13:14:23.361861 7f2142950700 -1 mds.-1.-1 *** got signal Terminated *** 2015-01-15 13:14:23.361901 7f2142950700 1 mds.-1.-1 suicide. wanted down:dne, now down:dne 2015-01-15 13:14:23.385996 7f48f424d800 0 ceph version 0.91-348-gd4a6447 (d4a64474e53ce7c9472feac530ca94ccf616fbcc), process ceph-mds, pid 6717 2015-01-15 13:14:23.388258 7f48f424d800 -1 mds.-1.0 log_to_monitors {default=true} 2015-01-15 13:14:23.390817 7f48f424d800 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted 2015-01-15 13:14:23.390839 7f48f424d800 1 mds.-1.0 suicide. wanted down:dne, now up:boot 2015-01-15 13:14:23.392899 7f48f424d800 -1 common/Thread.cc: In function 'int Thread::join(void**)' thread 7f48f424d800 time 2015-01-15 13:14:23.391350 common/Thread.cc: 136: FAILED assert("join on thread that was never started" == 0) ceph version 0.91-348-gd4a6447 (d4a64474e53ce7c9472feac530ca94ccf616fbcc) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x94427b] 2: (Thread::join(void**)+0x41) [0x9308a1] 3: (MDS::ProgressThread::shutdown()+0x74) [0x59bb64] 4: (MDS::suicide()+0x117) [0x59bc97] 5: (MDS::init(MDSMap::DaemonState)+0x2013) [0x5acc03] 6: (main()+0x94d) [0x59038d] 7: (__libc_start_main()+0xf5) [0x7f48f2251ec5] 8: /usr/bin/ceph-mds() [0x594d87] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
(Copied from http://pastebin.com/dh8kKcfh from CoilDomain in #ceph)
Files
Updated by Radoslaw Zarzynski over 9 years ago
Proposal of the patch attached.
Updated by John Spray over 9 years ago
Thanks for the patch. Please could you resend that as either a github pull request or if that is not possible, a git-formatted patch that I can apply with "git am"?
Please also update the commit message to include a valid "Signed-off-by" line (https://github.com/ceph/ceph#contributing-code)
Updated by Radoslaw Zarzynski over 9 years ago
I made a pull reuqest via github. Commit log has been improved as well. Direct link: https://github.com/ceph/ceph/pull/3514
Actions