Project

General

Profile

Bug #14697

mds: assert in SafeTimer while suiciding

Added by Greg Farnum about 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/gregf-2016-02-08_00:50:24-fs-greg-fs-testing-27-1---basic-mira/10767/

2016-02-08T02:00:32.446 INFO:tasks.ceph.mds.a-s.mira064.stderr:2016-02-08 10:00:31.363700 1466a700 -1 mds.a-s *** got signal Terminated ***
2016-02-08T02:00:34.587 INFO:tasks.ceph.mds.a-s.mira064.stderr:Thread::join(): pthread_join failed with error 22
2016-02-08T02:00:34.628 INFO:tasks.ceph.mds.a-s.mira064.stderr:common/Thread.cc: In function 'int Thread::join(void**)' thread 18277700 time 2016-02-08 10:00:34.589274
2016-02-08T02:00:34.628 INFO:tasks.ceph.mds.a-s.mira064.stderr:common/Thread.cc: 174: FAILED assert(status == 0)
2016-02-08T02:00:34.681 INFO:tasks.ceph.mds.a-s.mira064.stderr: ceph version 10.0.2-1868-g68f97e4 (68f97e4626554229d8670e6faf4fdad1824e025c)
2016-02-08T02:00:34.682 INFO:tasks.ceph.mds.a-s.mira064.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x6b2deb]
2016-02-08T02:00:34.682 INFO:tasks.ceph.mds.a-s.mira064.stderr: 2: (Thread::join(void**)+0xaa) [0x6a608a]
2016-02-08T02:00:34.682 INFO:tasks.ceph.mds.a-s.mira064.stderr: 3: (SafeTimer::shutdown()+0xa6) [0x6aa7d6]
2016-02-08T02:00:34.682 INFO:tasks.ceph.mds.a-s.mira064.stderr: 4: (MDSDaemon::suicide()+0x20b) [0x31458b]
2016-02-08T02:00:34.683 INFO:tasks.ceph.mds.a-s.mira064.stderr: 5: (Context::complete(int)+0x9) [0x3205a9]
2016-02-08T02:00:34.683 INFO:tasks.ceph.mds.a-s.mira064.stderr: 6: (MDSRank::suicide()+0x18) [0x328868]
2016-02-08T02:00:34.683 INFO:tasks.ceph.mds.a-s.mira064.stderr: 7: (MDSRank::boot_start(MDSRank::BootStep, int)+0xf3c) [0x33b9bc]
2016-02-08T02:00:34.683 INFO:tasks.ceph.mds.a-s.mira064.stderr: 8: (MDSInternalContextBase::complete(int)+0x1db) [0x54f71b]
2016-02-08T02:00:34.684 INFO:tasks.ceph.mds.a-s.mira064.stderr: 9: (void finish_contexts<MDSInternalContextBase>(CephContext*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >&, int)+0x94) [0x3451c4]
2016-02-08T02:00:34.684 INFO:tasks.ceph.mds.a-s.mira064.stderr: 10: (MDLog::_replay_thread()+0x219) [0x55dd89]
2016-02-08T02:00:34.684 INFO:tasks.ceph.mds.a-s.mira064.stderr: 11: (MDLog::ReplayThread::entry()+0xd) [0x34403d]
2016-02-08T02:00:34.684 INFO:tasks.ceph.mds.a-s.mira064.stderr: 12: (()+0x8182) [0xa9d6182]
2016-02-08T02:00:34.685 INFO:tasks.ceph.mds.a-s.mira064.stderr: 13: (clone()+0x6d) [0xc4bc47d]
2016-02-08T02:00:34.685 INFO:tasks.ceph.mds.a-s.mira064.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2016-02-08T02:00:34.686 INFO:tasks.ceph.mds.a-s.mira064.stderr:2016-02-08 10:00:34.682640 18277700 -1 common/Thread.cc: In function 'int Thread::join(void**)' thread 18277700 time 2016-02-08 10:00:34.589274
2016-02-08T02:00:34.686 INFO:tasks.ceph.mds.a-s.mira064.stderr:common/Thread.cc: 174: FAILED assert(status == 0)

This was in an integration branch, but I don't think any of the branches included had any impact on this bit.

Associated revisions

Revision 03800546 (diff)
Added by Greg Farnum about 8 years ago

mds: don't double-shutdown the timer when suiciding

Fixes: #14697

Signed-off-by: Greg Farnum <>

History

#1 Updated by Yuri Weinstein about 8 years ago

  • Related to Bug #14716: "Thread.cc: 143: FAILED assert(status == 0)" in fs-hammer---basic-smithi added

#2 Updated by Greg Farnum about 8 years ago

2016-02-08T02:00:34.587 INFO:tasks.ceph.mds.a-s.mira064.stderr:Thread::join(): pthread_join failed with error 22

22 is EINVAL, which is returned if "The value specified by thread does not refer to a joinable thread." Looks like we call timer.shutdown() twice when suiciding — once directly, and once in MDSRankDispatcher::shutdown().

https://github.com/ceph/ceph/pull/7616 (untested as yet)

#3 Updated by Greg Farnum about 8 years ago

  • Status changed from New to 17

#4 Updated by Greg Farnum about 8 years ago

  • Related to deleted (Bug #14716: "Thread.cc: 143: FAILED assert(status == 0)" in fs-hammer---basic-smithi)

#5 Updated by Greg Farnum about 8 years ago

  • Assignee set to Greg Farnum

#6 Updated by Greg Farnum about 8 years ago

  • Status changed from 17 to Resolved

#7 Updated by Greg Farnum over 7 years ago

  • Component(FS) MDS added

Also available in: Atom PDF