Project

General

Profile

Actions

Bug #9462

closed

msgr deadlock: osd reply vs mark_down vs fault

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Thread 160 (Thread 0x7f0ac2ffd700 (LWP 10067)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x0000000000fb28b4 in Cond::Wait (this=0x7f0ab2c6a228, mutex=...) at ./common/Cond.h:55
#2  0x00000000015e2d11 in Pipe::stop_and_wait (this=0x7f0ab2c6a010) at msg/Pipe.cc:1414
#3  0x00000000014600dd in SimpleMessenger::mark_down (this=0x3b7a890, con=0x7f0ab2c4cdf0) at msg/SimpleMessenger.cc:643
#4  0x00000000015ef80a in PipeConnection::mark_down (this=0x7f0ab2c4cdf0) at msg/PipeConnection.cc:84
#5  0x00000000012c41c8 in Objecter::close_session (this=0x3b942f0, s=0x7f0ab2c689b0) at osdc/Objecter.cc:1288
#6  0x00000000012c050b in Objecter::handle_osd_map (this=0x3b942f0, m=0x7f0ab4a4aeb0) at osdc/Objecter.cc:792
#7  0x00000000012beac5 in Objecter::ms_dispatch (this=0x3b942f0, m=0x7f0ab4a4aeb0) at osdc/Objecter.cc:597
#8  0x00000000015b6afd in Messenger::ms_deliver_dispatch (this=0x3b7a890, m=0x7f0ab4a4aeb0) at msg/Messenger.h:532
#9  0x00000000015b6052 in DispatchQueue::entry (this=0x3b7aa48) at msg/DispatchQueue.cc:185
#10 0x00000000014616e0 in DispatchQueue::DispatchThread::entry (this=0x3b7abd0) at msg/DispatchQueue.h:104
#11 0x000000000146bcdb in Thread::entry_wrapper (this=0x3b7abd0) at common/Thread.cc:60
#12 0x000000000146bc58 in Thread::_entry_func (arg=0x3b7abd0) at common/Thread.cc:45
#13 0x00007f0acc097e9a in start_thread (arg=0x7f0ac2ffd700) at pthread_create.c:308
#14 0x00007f0acac4c3fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#15 0x0000000000000000 in ?? ()

Thread 154 (Thread 0x7f0abb7fe700 (LWP 10075)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
#1  0x00007f0acc09a065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f0acc099eba in __pthread_mutex_lock (mutex=0x3b7ad18) at pthread_mutex_lock.c:61
#3  0x0000000001426458 in Mutex::Lock (this=0x3b7ad08, no_lockdep=false) at common/Mutex.cc:91
#4  0x000000000145ff42 in SimpleMessenger::mark_down (this=0x3b7a890, con=0x7f09f1055dd0) at msg/SimpleMessenger.cc:636
#5  0x00000000015ef80a in PipeConnection::mark_down (this=0x7f09f1055dd0) at msg/PipeConnection.cc:84
#6  0x0000000001412664 in MonClient::_reopen_session (this=0x7fff8c55b720, rank=-1, name=...) at mon/MonClient.cc:604
#7  0x0000000000fe9174 in MonClient::_reopen_session (this=0x7fff8c55b720) at ./mon/MonClient.h:190
#8  0x0000000000fe9294 in MonClient::reopen_session (this=0x7fff8c55b720, cb=0x0) at ./mon/MonClient.h:315
#9  0x0000000000fe65f6 in Beacon::is_laggy (this=0x3b93150) at mds/Beacon.cc:216
#10 0x0000000000f9c6e4 in MDS::tick (this=0x3b93010) at mds/MDS.cc:724
#11 0x0000000000fb92b6 in MDS::C_MDS_Tick::finish (this=0x7f0a14157ec0, r=0) at mds/MDS.h:293
#12 0x0000000000fb270f in Context::complete (this=0x7f0a14157ec0, r=0) at ./include/Context.h:64
#13 0x000000000128d855 in MDSInternalContextBase::complete (this=0x7f0a14157ec0, r=0) at mds/MDSContext.cc:30
#14 0x000000000146f278 in SafeTimer::timer_thread (this=0x3b93080) at common/Timer.cc:105
#15 0x0000000001472416 in SafeTimerThread::entry (this=0x3b926a0) at common/Timer.cc:38
#16 0x000000000146bcdb in Thread::entry_wrapper (this=0x3b926a0) at common/Thread.cc:60
#17 0x000000000146bc58 in Thread::_entry_func (arg=0x3b926a0) at common/Thread.cc:45
#18 0x00007f0acc097e9a in start_thread (arg=0x7f0abb7fe700) at pthread_create.c:308
#19 0x00007f0acac4c3fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#20 0x0000000000000000 in ?? ()

Thread 148 (Thread 0x7f0ab8ff9700 (LWP 10088)):
#0  pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:85
#1  0x0000000000fb3c01 in RWLock::get_read (this=0x3b943b0) at ./common/RWLock.h:68
#2  0x00000000012dcde2 in RWLock::RLocker::RLocker (this=0x7f0ab8ff83b0, lock=...) at ./common/RWLock.h:122
#3  0x00000000012c73cb in Objecter::op_submit (this=0x3b942f0, op=0x7f0a9c14d270) at osdc/Objecter.cc:1674
#4  0x00000000012b530d in Objecter::write_trunc (this=0x3b942f0, oid=..., oloc=..., off=1125110, len=952, snapc=..., bl=..., mtime=..., flags=0, trunc_size=0, trunc_seq=0, onack=0x0, oncommit=0x7f0a9c01dae0, objver=0x0, extra_ops=0x0) at ./osdc/Objecter.h:2027
#5  0x00000000012b5bbe in Objecter::sg_write_trunc (this=0x3b942f0, extents=..., snapc=..., bl=..., mtime=..., flags=0, trunc_size=0, trunc_seq=0, onack=0x0, oncommit=0x7f0a9c01dae0) at ./osdc/Objecter.h:2257
#6  0x00000000012b5f52 in Objecter::sg_write (this=0x3b942f0, extents=..., snapc=..., bl=..., mtime=..., flags=0, onack=0x0, oncommit=0x7f0a9c01dae0) at ./osdc/Objecter.h:2280
#7  0x00000000012b60d1 in Filer::write (this=0x7f0aa8002970, ino=..., layout=0x7f0aa8002939, snapc=..., offset=27113106166, len=952, bl=..., mtime=..., flags=0, onack=0x0, oncommit=0x7f0a9c01dae0) at ./osdc/Filer.h:146
#8  0x00000000012ae943 in Journaler::_do_flush (this=0x7f0aa8002830, amount=0) at osdc/Journaler.cc:610
#9  0x00000000012af766 in Journaler::_flush (this=0x7f0aa8002830, onsafe=0x0) at osdc/Journaler.cc:678
#10 0x00000000012af0e3 in Journaler::flush (this=0x7f0aa8002830, onsafe=0x0) at osdc/Journaler.cc:652
#11 0x000000000128fd14 in MDLog::_submit_thread (this=0x3b970f0) at mds/MDLog.cc:370
#12 0x0000000000fb99a6 in MDLog::SubmitThread::entry (this=0x3b972f8) at mds/MDLog.h:143
#13 0x000000000146bcdb in Thread::entry_wrapper (this=0x3b972f8) at common/Thread.cc:60
#14 0x000000000146bc58 in Thread::_entry_func (arg=0x3b972f8) at common/Thread.cc:45
#15 0x00007f0acc097e9a in start_thread (arg=0x7f0ab8ff9700) at pthread_create.c:308
#16 0x00007f0acac4c3fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#17 0x0000000000000000 in ?? ()

Thread 145 (Thread 0x7f0ac03f4700 (LWP 10091)):
#0  pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:85
#1  0x0000000000fb3c01 in RWLock::get_read (this=0x3b943b0) at ./common/RWLock.h:68
#2  0x00000000012dcde2 in RWLock::RLocker::RLocker (this=0x7f0ac03f3280, lock=...) at ./common/RWLock.h:122
#3  0x00000000012ccd6a in Objecter::handle_osd_op_reply (this=0x3b942f0, m=0x7f0a900191e0) at osdc/Objecter.cc:2461
#4  0x00000000012bea1f in Objecter::ms_dispatch (this=0x3b942f0, m=0x7f0a900191e0) at osdc/Objecter.cc:574
#5  0x00000000012ddd88 in Objecter::ms_fast_dispatch (this=0x3b942f0, m=0x7f0a900191e0) at osdc/Objecter.h:1667
#6  0x00000000015b6982 in Messenger::ms_fast_dispatch (this=0x3b7a890, m=0x7f0a900191e0) at msg/Messenger.h:503
#7  0x00000000015b55fc in DispatchQueue::fast_dispatch (this=0x3b7aa48, m=0x7f0a900191e0) at msg/DispatchQueue.cc:71
#8  0x00000000015e479f in Pipe::reader (this=0x7f0aa4002000) at msg/Pipe.cc:1568
#9  0x00000000015eb8ec in Pipe::Reader::entry (this=0x7f0aa4002018) at msg/Pipe.h:50
#10 0x000000000146bcdb in Thread::entry_wrapper (this=0x7f0aa4002018) at common/Thread.cc:60
#11 0x000000000146bc58 in Thread::_entry_func (arg=0x7f0aa4002018) at common/Thread.cc:45
#12 0x00007f0acc097e9a in start_thread (arg=0x7f0ac03f4700) at pthread_create.c:308
#13 0x00007f0acac4c3fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
---Type <return> to continue, or q <return> to quit---
#14 0x0000000000000000 in ?? ()
(several of these ...)

Thread 101 (Thread 0x7f0a682eb700 (LWP 10187)):
#0  pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:85
#1  0x0000000000fb3c01 in RWLock::get_read (this=0x3b943b0) at ./common/RWLock.h:68

#2  0x0000000000fb6338 in Objecter::get_osdmap_read (this=0x3b942f0) at ./osdc/Objecter.h:1622
#3  0x00000000012a4575 in Filer::_do_purge_range (this=0x3b96450, pr=0x7f0a143b9730, fin=1) at osdc/Filer.cc:315
#4  0x00000000012a506f in C_PurgeRange::finish (this=0x7f0a4808b7c0, r=-2) at osdc/Filer.cc:296
#5  0x0000000000fb270f in Context::complete (this=0x7f0a4808b7c0, r=-2) at ./include/Context.h:64
#6  0x00000000012ceb35 in Objecter::handle_osd_op_reply (this=0x3b942f0, m=0x7f0a5405da30) at osdc/Objecter.cc:2643
#7  0x00000000012bea1f in Objecter::ms_dispatch (this=0x3b942f0, m=0x7f0a5405da30) at osdc/Objecter.cc:574
#8  0x00000000012ddd88 in Objecter::ms_fast_dispatch (this=0x3b942f0, m=0x7f0a5405da30) at osdc/Objecter.h:1667
#9  0x00000000015b6982 in Messenger::ms_fast_dispatch (this=0x3b7a890, m=0x7f0a5405da30) at msg/Messenger.h:503
#10 0x00000000015b55fc in DispatchQueue::fast_dispatch (this=0x3b7aa48, m=0x7f0a5405da30) at msg/DispatchQueue.cc:71
#11 0x00000000015e479f in Pipe::reader (this=0x7f0ab2c52010) at msg/Pipe.cc:1568
#12 0x00000000015eb8ec in Pipe::Reader::entry (this=0x7f0ab2c52028) at msg/Pipe.h:50
#13 0x000000000146bcdb in Thread::entry_wrapper (this=0x7f0ab2c52028) at common/Thread.cc:60
#14 0x000000000146bc58 in Thread::_entry_func (arg=0x7f0ab2c52028) at common/Thread.cc:45
#15 0x00007f0acc097e9a in start_thread (arg=0x7f0a682eb700) at pthread_create.c:308
#16 0x00007f0acac4c3fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#17 0x0000000000000000 in ?? ()

Thread 43 (Thread 0x7f09f7fff700 (LWP 10251)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
#1  0x00007f0acc09a065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f0acc099eba in __pthread_mutex_lock (mutex=0x3b7ad18) at pthread_mutex_lock.c:61
#3  0x0000000001426458 in Mutex::Lock (this=0x3b7ad08, no_lockdep=false) at common/Mutex.cc:91
#4  0x00000000015e1ab9 in Pipe::fault (this=0x7f0ab2cd1670, onread=true) at msg/Pipe.cc:1301
#5  0x00000000015e3257 in Pipe::reader (this=0x7f0ab2cd1670) at msg/Pipe.cc:1451
#6  0x00000000015eb8ec in Pipe::Reader::entry (this=0x7f0ab2cd1688) at msg/Pipe.h:50
#7  0x000000000146bcdb in Thread::entry_wrapper (this=0x7f0ab2cd1688) at common/Thread.cc:60
#8  0x000000000146bc58 in Thread::_entry_func (arg=0x7f0ab2cd1688) at common/Thread.cc:45
#9  0x00007f0acc097e9a in start_thread (arg=0x7f09f7fff700) at pthread_create.c:308
#10 0x00007f0acac4c3fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()


pid 10058 on burnupi21

Files

all_thread.txt (177 KB) all_thread.txt Sage Weil, 09/13/2014 10:13 PM

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #9295: osd/OSD.cc: 5501: FAILED assert(session) in ms_fast_dispatchResolved08/29/2014

Actions
Actions

Also available in: Atom PDF