Project

General

Profile

Actions

Bug #14697

closed

mds: assert in SafeTimer while suiciding

Added by Greg Farnum about 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/gregf-2016-02-08_00:50:24-fs-greg-fs-testing-27-1---basic-mira/10767/

2016-02-08T02:00:32.446 INFO:tasks.ceph.mds.a-s.mira064.stderr:2016-02-08 10:00:31.363700 1466a700 -1 mds.a-s *** got signal Terminated ***
2016-02-08T02:00:34.587 INFO:tasks.ceph.mds.a-s.mira064.stderr:Thread::join(): pthread_join failed with error 22
2016-02-08T02:00:34.628 INFO:tasks.ceph.mds.a-s.mira064.stderr:common/Thread.cc: In function 'int Thread::join(void**)' thread 18277700 time 2016-02-08 10:00:34.589274
2016-02-08T02:00:34.628 INFO:tasks.ceph.mds.a-s.mira064.stderr:common/Thread.cc: 174: FAILED assert(status == 0)
2016-02-08T02:00:34.681 INFO:tasks.ceph.mds.a-s.mira064.stderr: ceph version 10.0.2-1868-g68f97e4 (68f97e4626554229d8670e6faf4fdad1824e025c)
2016-02-08T02:00:34.682 INFO:tasks.ceph.mds.a-s.mira064.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x6b2deb]
2016-02-08T02:00:34.682 INFO:tasks.ceph.mds.a-s.mira064.stderr: 2: (Thread::join(void**)+0xaa) [0x6a608a]
2016-02-08T02:00:34.682 INFO:tasks.ceph.mds.a-s.mira064.stderr: 3: (SafeTimer::shutdown()+0xa6) [0x6aa7d6]
2016-02-08T02:00:34.682 INFO:tasks.ceph.mds.a-s.mira064.stderr: 4: (MDSDaemon::suicide()+0x20b) [0x31458b]
2016-02-08T02:00:34.683 INFO:tasks.ceph.mds.a-s.mira064.stderr: 5: (Context::complete(int)+0x9) [0x3205a9]
2016-02-08T02:00:34.683 INFO:tasks.ceph.mds.a-s.mira064.stderr: 6: (MDSRank::suicide()+0x18) [0x328868]
2016-02-08T02:00:34.683 INFO:tasks.ceph.mds.a-s.mira064.stderr: 7: (MDSRank::boot_start(MDSRank::BootStep, int)+0xf3c) [0x33b9bc]
2016-02-08T02:00:34.683 INFO:tasks.ceph.mds.a-s.mira064.stderr: 8: (MDSInternalContextBase::complete(int)+0x1db) [0x54f71b]
2016-02-08T02:00:34.684 INFO:tasks.ceph.mds.a-s.mira064.stderr: 9: (void finish_contexts<MDSInternalContextBase>(CephContext*, std::list<MDSInternalContextBase*, std::allocator<MDSInternalContextBase*> >&, int)+0x94) [0x3451c4]
2016-02-08T02:00:34.684 INFO:tasks.ceph.mds.a-s.mira064.stderr: 10: (MDLog::_replay_thread()+0x219) [0x55dd89]
2016-02-08T02:00:34.684 INFO:tasks.ceph.mds.a-s.mira064.stderr: 11: (MDLog::ReplayThread::entry()+0xd) [0x34403d]
2016-02-08T02:00:34.684 INFO:tasks.ceph.mds.a-s.mira064.stderr: 12: (()+0x8182) [0xa9d6182]
2016-02-08T02:00:34.685 INFO:tasks.ceph.mds.a-s.mira064.stderr: 13: (clone()+0x6d) [0xc4bc47d]
2016-02-08T02:00:34.685 INFO:tasks.ceph.mds.a-s.mira064.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2016-02-08T02:00:34.686 INFO:tasks.ceph.mds.a-s.mira064.stderr:2016-02-08 10:00:34.682640 18277700 -1 common/Thread.cc: In function 'int Thread::join(void**)' thread 18277700 time 2016-02-08 10:00:34.589274
2016-02-08T02:00:34.686 INFO:tasks.ceph.mds.a-s.mira064.stderr:common/Thread.cc: 174: FAILED assert(status == 0)

This was in an integration branch, but I don't think any of the branches included had any impact on this bit.

Actions #1

Updated by Yuri Weinstein about 8 years ago

  • Related to Bug #14716: "Thread.cc: 143: FAILED assert(status == 0)" in fs-hammer---basic-smithi added
Actions #2

Updated by Greg Farnum about 8 years ago

2016-02-08T02:00:34.587 INFO:tasks.ceph.mds.a-s.mira064.stderr:Thread::join(): pthread_join failed with error 22

22 is EINVAL, which is returned if "The value specified by thread does not refer to a joinable thread." Looks like we call timer.shutdown() twice when suiciding — once directly, and once in MDSRankDispatcher::shutdown().

https://github.com/ceph/ceph/pull/7616 (untested as yet)

Actions #3

Updated by Greg Farnum about 8 years ago

  • Status changed from New to 17
Actions #4

Updated by Greg Farnum about 8 years ago

  • Related to deleted (Bug #14716: "Thread.cc: 143: FAILED assert(status == 0)" in fs-hammer---basic-smithi)
Actions #5

Updated by Greg Farnum about 8 years ago

  • Assignee set to Greg Farnum
Actions #6

Updated by Greg Farnum about 8 years ago

  • Status changed from 17 to Resolved
Actions #7

Updated by Greg Farnum almost 8 years ago

  • Component(FS) MDS added
Actions

Also available in: Atom PDF