Actions
Fix #3665
closedlibrbd: deadlock during flatten
Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Ran into this trying to reproduce #3631.
The test_librbd_fsx process is still running on plana34 for debugging.
Thread 15 (Thread 0x7f5d0affd700 (LWP 7116)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 #1 0x00007f5d14a25065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f5d14a24eba in __pthread_mutex_lock (mutex=0x1efa3d8) at pthread_mutex_lock.c:61 #3 0x00007f5d15a984dc in Mutex::Lock (this=0x1efa3c8, no_lockdep=false) at common/Mutex.cc:90 #4 0x00007f5d168a940b in Mutex::Locker::Locker (this=0x7f5d0affca10, m=...) at ./common/Mutex.h:120 #5 0x00007f5d168de4bf in librbd::C_Request::finish (this=0x7f5bf0004380, r=0) at librbd/LibrbdWriteback.cc:55 #6 0x00007f5d168a5913 in Context::complete (this=0x7f5bf0004380, r=0) at ./include/Context.h:41 #7 0x00007f5d168a59a3 in librbd::AioRequest::complete (this=0x7f5bf00016e0, r=0) at ./librbd/AioRequest.h:40 #8 0x00007f5d168d432e in librbd::rados_req_cb (c=0x7f5bec104110, arg=0x7f5bf00016e0) at librbd/internal.cc:2409 #9 0x00007f5d15a37661 in librados::C_AioSafe::finish (this=0x7f5cf89bc4f0, r=0) at ./librados/AioCompletionImpl.h:190 #10 0x00007f5d15aabe8e in Finisher::finisher_thread_entry (this=0x176ce28) at common/Finisher.cc:56 #11 0x00007f5d15a208b4 in Finisher::FinisherThread::entry (this=0x176cf40) at ./common/Finisher.h:46 #12 0x00007f5d15c4d6dd in Thread::_entry_func (arg=0x176cf40) at common/Thread.cc:41 #13 0x00007f5d14a22e9a in start_thread (arg=0x7f5d0affd700) at pthread_create.c:308 #14 0x00007f5d14d2a4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #15 0x0000000000000000 in ?? () Thread 14 (Thread 0x7f5d0a1f6700 (LWP 7204)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 #1 0x00007f5d14a25065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f5d14a24eba in __pthread_mutex_lock (mutex=0x1efa430) at pthread_mutex_lock.c:61 #3 0x00007f5d15a984dc in Mutex::Lock (this=0x1efa420, no_lockdep=false) at common/Mutex.cc:90 #4 0x00007f5d168ddf29 in librbd::LibrbdWriteback::write (this=0x17730d0, oid=..., oloc=..., off=9960727, len=60247, snapc=..., bl=..., mtime=..., trunc_size=0, trunc_seq=0, oncommit=0x7f5bf0003a90) at librbd/LibrbdWriteback.cc:119 #5 0x00007f5d168e5cc8 in ObjectCacher::bh_write (this=0x1ed6be0, bh=0x1f2c990) at osdc/ObjectCacher.cc:751 #6 0x00007f5d168ebf9e in ObjectCacher::flusher_entry (this=0x1ed6be0) at osdc/ObjectCacher.cc:1363 #7 0x00007f5d168f3bf6 in ObjectCacher::FlusherThread::entry (this=0x1ed6df0) at osdc/ObjectCacher.h:345 #8 0x00007f5d15c4d6dd in Thread::_entry_func (arg=0x1ed6df0) at common/Thread.cc:41 #9 0x00007f5d14a22e9a in start_thread (arg=0x7f5d0a1f6700) at pthread_create.c:308 #10 0x00007f5d14d2a4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #11 0x0000000000000000 in ?? () ... Thread 1 (Thread 0x7f5d16d57780 (LWP 7105)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f5d168a29ce in Cond::Wait (this=0x7fff06f67280, mutex=...) at ./common/Cond.h:55 #2 0x00007f5d168d30a8 in librbd::read (ictx=0x1ee91b0, image_extents=..., buf=0x7f5c95ffe010 "", pbl=0x0) at librbd/internal.cc:2259 #3 0x00007f5d168d0e7d in librbd::flatten (ictx=0x1efa200, prog_ctx=...) at librbd/internal.cc:2002 #4 0x00007f5d168a0970 in rbd_flatten (image=0x1efa200) at librbd/librbd.cc:617 #5 0x00000000004054b6 in do_flatten () at test/librbd/fsx.c:942 #6 0x00000000004058d1 in test () at test/librbd/fsx.c:1077 #7 0x00000000004031bc in main (argc=<optimized out>, argv=<optimized out>) at test/librbd/fsx.c:1551
Updated by Sage Weil over 11 years ago
the problem is that we are holding the snap_lock and then waiting for io. but we mostly use snap_lock as a tight inner lock that we take while e.g. processing an io reply or sending a request.
Thread 1 (Thread 0x7fca7dcb5780 (LWP 13316)): ---Type <return> to continue, or q <return> to quit--- #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007fca7d800a3e in Cond::Wait (this=0x7ffffe0a9b20, mutex=...) at ./common/Cond.h:55 #2 0x00007fca7d831072 in librbd::read (ictx=0x1eee3f0, image_extents=..., buf=0x7fc9707fa010 "", pbl=0x0) at librbd/internal.cc:2241 #3 0x00007fca7d82ee46 in librbd::flatten (ictx=0x30b65d0, prog_ctx=...) at librbd/internal.cc:1984 #4 0x00007fca7d7fe9e0 in rbd_flatten (image=0x30b65d0) at librbd/librbd.cc:617 #5 0x00000000004054b6 in do_flatten () at test/librbd/fsx.c:942 #6 0x00000000004058d1 in test () at test/librbd/fsx.c:1077 #7 0x00000000004031bc in main (argc=<optimized out>, argv=<optimized out>) at test/librbd/fsx.c:1551
in this case, the librbd::flatten() is holding snap_lock and then doing a read.
Updated by Sage Weil over 11 years ago
possibly here:
(gdb) thr app all bt Thread 34 (Thread 0x7f53c75ab700 (LWP 17126)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f53c9d08c60 in ceph::log::Log::entry (this=0xe9c820) at log/Log.cc:323 #2 0x00007f53c9e3011d in Thread::_entry_func (arg=0xe9c820) at common/Thread.cc:41 #3 0x00007f53c8c04e9a in start_thread (arg=0x7f53c75ab700) at pthread_create.c:308 #4 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () Thread 33 (Thread 0x7f53c6daa700 (LWP 17127)): #0 sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:102 #1 0x00007f53c9e2983b in CephContextServiceThread::entry (this=0xe9aae0) at common/ceph_context.cc:57 #2 0x00007f53c9e3011d in Thread::_entry_func (arg=0xe9aae0) at common/Thread.cc:41 #3 0x00007f53c8c04e9a in start_thread (arg=0x7f53c6daa700) at pthread_create.c:308 #4 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () Thread 32 (Thread 0x7f53c65a9700 (LWP 17128)): #0 0x00007f53c8f00b03 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f53c9c7fa61 in AdminSocket::entry (this=0xe9cd80) at common/admin_socket.cc:228 #2 0x00007f53c9e3011d in Thread::_entry_func (arg=0xe9cd80) at common/Thread.cc:41 #3 0x00007f53c8c04e9a in start_thread (arg=0x7f53c65a9700) at pthread_create.c:308 #4 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () Thread 31 (Thread 0x7f53c5da8700 (LWP 17129)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f53caa84dbc in Cond::Wait (this=0xea2860, mutex=...) at ./common/Cond.h:55 #2 0x00007f53c9d2a2ed in DispatchQueue::entry (this=0xea27f8) at msg/DispatchQueue.cc:117 #3 0x00007f53c9dc5d8e in DispatchQueue::DispatchThread::entry (this=0xea2910) at msg/DispatchQueue.h:85 #4 0x00007f53c9e3011d in Thread::_entry_func (arg=0xea2910) at common/Thread.cc:41 #5 0x00007f53c8c04e9a in start_thread (arg=0x7f53c5da8700) at pthread_create.c:308 #6 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 30 (Thread 0x7f53c55a7700 (LWP 17130)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f53caa84dbc in Cond::Wait (this=0xea2b70, mutex=...) at ./common/Cond.h:55 #2 0x00007f53c9dc07ee in SimpleMessenger::reaper_entry (this=0xea2710) at msg/SimpleMessenger.cc:206 #3 0x00007f53c9dc64be in SimpleMessenger::ReaperThread::entry (this=0xea2930) at msg/SimpleMessenger.h:411 #4 0x00007f53c9e3011d in Thread::_entry_func (arg=0xea2930) at common/Thread.cc:41 #5 0x00007f53c8c04e9a in start_thread (arg=0x7f53c55a7700) at pthread_create.c:308 #6 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 29 (Thread 0x7f53c4da6700 (LWP 17131)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f53caad4c4b in Cond::WaitUntil (this=0xe9e9a0, mutex=..., when=...) at ./common/Cond.h:71 #2 0x00007f53c9c8a892 in SafeTimer::timer_thread (this=0xe9e990) at common/Timer.cc:115 #3 0x00007f53c9c8baea in SafeTimerThread::entry (this=0xea4990) at common/Timer.cc:38 #4 0x00007f53c9e3011d in Thread::_entry_func (arg=0xea4990) at common/Thread.cc:41 #5 0x00007f53c8c04e9a in start_thread (arg=0x7f53c4da6700) at pthread_create.c:308 #6 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 28 (Thread 0x7f53bffff700 (LWP 17132)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f53caa84dbc in Cond::Wait (this=0xe9eab8, mutex=...) at ./common/Cond.h:55 #2 0x00007f53c9c8e5af in Finisher::finisher_thread_entry (this=0xe9ea58) at common/Finisher.cc:80 #3 0x00007f53c9c028f4 in Finisher::FinisherThread::entry (this=0xe9eb70) at ./common/Finisher.h:46 #4 0x00007f53c9e3011d in Thread::_entry_func (arg=0xe9eb70) at common/Thread.cc:41 ---Type <return> to continue, or q <return> to quit--- #5 0x00007f53c8c04e9a in start_thread (arg=0x7f53bffff700) at pthread_create.c:308 #6 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 27 (Thread 0x7f53bf7fe700 (LWP 17135)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f53caad4c4b in Cond::WaitUntil (this=0xe9eda8, mutex=..., when=...) at ./common/Cond.h:71 #2 0x00007f53c9c8a892 in SafeTimer::timer_thread (this=0xe9ed98) at common/Timer.cc:115 #3 0x00007f53c9c8baea in SafeTimerThread::entry (this=0xea6220) at common/Timer.cc:38 #4 0x00007f53c9e3011d in Thread::_entry_func (arg=0xea6220) at common/Thread.cc:41 #5 0x00007f53c8c04e9a in start_thread (arg=0x7f53bf7fe700) at pthread_create.c:308 #6 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 26 (Thread 0x7f53beffd700 (LWP 17136)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 #1 0x00007f53c8c07065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f53c8c06eba in __pthread_mutex_lock (mutex=0x137e628) at pthread_mutex_lock.c:61 #3 0x00007f53c9c7a6e0 in Mutex::Lock (this=0x137e618, no_lockdep=false) at common/Mutex.cc:90 #4 0x00007f53caa8b7f7 in Mutex::Locker::Locker (this=0x7f53beffca10, m=...) at ./common/Mutex.h:120 #5 0x00007f53caac099b in librbd::C_Request::finish (this=0x7f52a40039f0, r=0) at librbd/LibrbdWriteback.cc:55 #6 0x00007f53caa87cff in Context::complete (this=0x7f52a40039f0, r=0) at ./include/Context.h:41 #7 0x00007f53caa87d8f in librbd::AioRequest::complete (this=0x7f52a40030e0, r=0) at ./librbd/AioRequest.h:40 #8 0x00007f53caab67aa in librbd::rados_req_cb (c=0x7f53423f5140, arg=0x7f52a40030e0) at librbd/internal.cc:2416 #9 0x00007f53c9c196a1 in librados::C_AioSafe::finish (this=0x7f53b52e7b10, r=0) at ./librados/AioCompletionImpl.h:190 #10 0x00007f53c9c8e08e in Finisher::finisher_thread_entry (this=0xe9ee68) at common/Finisher.cc:56 #11 0x00007f53c9c028f4 in Finisher::FinisherThread::entry (this=0xe9ef80) at ./common/Finisher.h:46 #12 0x00007f53c9e3011d in Thread::_entry_func (arg=0xe9ef80) at common/Thread.cc:41 #13 0x00007f53c8c04e9a in start_thread (arg=0x7f53beffd700) at pthread_create.c:308 #14 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #15 0x0000000000000000 in ?? () Thread 25 (Thread 0x7f52ab7fe700 (LWP 17162)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f53caad4c4b in Cond::WaitUntil (this=0xef10f8, mutex=..., when=...) at ./common/Cond.h:71 #2 0x00007f53caad4cb2 in Cond::WaitInterval (this=0xef10f8, cct=0xe9a980, mutex=..., interval=...) at ./common/Cond.h:79 #3 0x00007f53caace52d in ObjectCacher::flusher_entry (this=0xef0f30) at osdc/ObjectCacher.cc:1368 #4 0x00007f53caad60d2 in ObjectCacher::FlusherThread::entry (this=0xef1140) at osdc/ObjectCacher.h:345 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0xef1140) at common/Thread.cc:41 #6 0x00007f53c8c04e9a in start_thread (arg=0x7f52ab7fe700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 24 (Thread 0x7f53bd9f5700 (LWP 17182)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f53caad4c4b in Cond::WaitUntil (this=0xf47ca8, mutex=..., when=...) at ./common/Cond.h:71 #2 0x00007f53caad4cb2 in Cond::WaitInterval (this=0xf47ca8, cct=0xe9a980, mutex=..., interval=...) at ./common/Cond.h:79 #3 0x00007f53caace52d in ObjectCacher::flusher_entry (this=0xf47ae0) at osdc/ObjectCacher.cc:1368 #4 0x00007f53caad60d2 in ObjectCacher::FlusherThread::entry (this=0xf47cf0) at osdc/ObjectCacher.h:345 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0xf47cf0) at common/Thread.cc:41 #6 0x00007f53c8c04e9a in start_thread (arg=0x7f53bd9f5700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 23 (Thread 0x7f53576fb700 (LWP 17191)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f53caad4c4b in Cond::WaitUntil (this=0x11975a8, mutex=..., when=...) at ./common/Cond.h:71 #2 0x00007f53caad4cb2 in Cond::WaitInterval (this=0x11975a8, cct=0xe9a980, mutex=..., interval=...) at ./common/Cond.h:79 #3 0x00007f53caace52d in ObjectCacher::flusher_entry (this=0x11973e0) at osdc/ObjectCacher.cc:1368 #4 0x00007f53caad60d2 in ObjectCacher::FlusherThread::entry (this=0x11975f0) at osdc/ObjectCacher.h:345 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0x11975f0) at common/Thread.cc:41 ---Type <return> to continue, or q <return> to quit--- #6 0x00007f53c8c04e9a in start_thread (arg=0x7f53576fb700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 22 (Thread 0x7f53bd1f4700 (LWP 17198)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f53caad4c4b in Cond::WaitUntil (this=0x11e3dc8, mutex=..., when=...) at ./common/Cond.h:71 #2 0x00007f53caad4cb2 in Cond::WaitInterval (this=0x11e3dc8, cct=0xe9a980, mutex=..., interval=...) at ./common/Cond.h:79 #3 0x00007f53caace52d in ObjectCacher::flusher_entry (this=0x11e3c00) at osdc/ObjectCacher.cc:1368 #4 0x00007f53caad60d2 in ObjectCacher::FlusherThread::entry (this=0x11e3e10) at osdc/ObjectCacher.h:345 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0x11e3e10) at common/Thread.cc:41 #6 0x00007f53c8c04e9a in start_thread (arg=0x7f53bd1f4700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 21 (Thread 0x7f53be5fa700 (LWP 17290)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f53caad4c4b in Cond::WaitUntil (this=0x11dba68, mutex=..., when=...) at ./common/Cond.h:71 #2 0x00007f53caad4cb2 in Cond::WaitInterval (this=0x11dba68, cct=0xe9a980, mutex=..., interval=...) at ./common/Cond.h:79 #3 0x00007f53caace52d in ObjectCacher::flusher_entry (this=0x11db8a0) at osdc/ObjectCacher.cc:1368 #4 0x00007f53caad60d2 in ObjectCacher::FlusherThread::entry (this=0x11dbab0) at osdc/ObjectCacher.h:345 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0x11dbab0) at common/Thread.cc:41 #6 0x00007f53c8c04e9a in start_thread (arg=0x7f53be5fa700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 20 (Thread 0x7f5356efa700 (LWP 17291)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f53caad4c4b in Cond::WaitUntil (this=0x14bf458, mutex=..., when=...) at ./common/Cond.h:71 #2 0x00007f53caad4cb2 in Cond::WaitInterval (this=0x14bf458, cct=0xe9a980, mutex=..., interval=...) at ./common/Cond.h:79 #3 0x00007f53caace52d in ObjectCacher::flusher_entry (this=0x14bf290) at osdc/ObjectCacher.cc:1368 #4 0x00007f53caad60d2 in ObjectCacher::FlusherThread::entry (this=0x14bf4a0) at osdc/ObjectCacher.h:345 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0x14bf4a0) at common/Thread.cc:41 #6 0x00007f53c8c04e9a in start_thread (arg=0x7f5356efa700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 19 (Thread 0x7f53566f9700 (LWP 17337)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f53caad4c4b in Cond::WaitUntil (this=0x12fa948, mutex=..., when=...) at ./common/Cond.h:71 #2 0x00007f53caad4cb2 in Cond::WaitInterval (this=0x12fa948, cct=0xe9a980, mutex=..., interval=...) at ./common/Cond.h:79 #3 0x00007f53caace52d in ObjectCacher::flusher_entry (this=0x12fa780) at osdc/ObjectCacher.cc:1368 #4 0x00007f53caad60d2 in ObjectCacher::FlusherThread::entry (this=0x12fa990) at osdc/ObjectCacher.h:345 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0x12fa990) at common/Thread.cc:41 #6 0x00007f53c8c04e9a in start_thread (arg=0x7f53566f9700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 18 (Thread 0x7f52abfff700 (LWP 17344)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 #1 0x00007f53c8c07065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f53c8c06eba in __pthread_mutex_lock (mutex=0x137e680) at pthread_mutex_lock.c:61 #3 0x00007f53c9c7a6e0 in Mutex::Lock (this=0x137e670, no_lockdep=false) at common/Mutex.cc:90 #4 0x00007f53caac0405 in librbd::LibrbdWriteback::write (this=0x18f04e0, oid=..., oloc=..., off=10650342, len=747, snapc=..., bl=..., mtime=..., trunc_size=0, trunc_seq=0, oncommit=0x7f52a4002470) at librbd/LibrbdWriteback.cc:119 #5 0x00007f53caac81a4 in ObjectCacher::bh_write (this=0x1a60c10, bh=0x13e85c0) at osdc/ObjectCacher.cc:751 #6 0x00007f53caace47a in ObjectCacher::flusher_entry (this=0x1a60c10) at osdc/ObjectCacher.cc:1363 #7 0x00007f53caad60d2 in ObjectCacher::FlusherThread::entry (this=0x1a60e20) at osdc/ObjectCacher.h:345 #8 0x00007f53c9e3011d in Thread::_entry_func (arg=0x1a60e20) at common/Thread.cc:41 #9 0x00007f53c8c04e9a in start_thread (arg=0x7f52abfff700) at pthread_create.c:308 #10 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #11 0x0000000000000000 in ?? () ---Type <return> to continue, or q <return> to quit--- Thread 17 (Thread 0x7f53bc9f3700 (LWP 17345)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f53caad4c4b in Cond::WaitUntil (this=0x1dcd6e8, mutex=..., when=...) at ./common/Cond.h:71 #2 0x00007f53caad4cb2 in Cond::WaitInterval (this=0x1dcd6e8, cct=0xe9a980, mutex=..., interval=...) at ./common/Cond.h:79 #3 0x00007f53caace52d in ObjectCacher::flusher_entry (this=0x1dcd520) at osdc/ObjectCacher.cc:1368 #4 0x00007f53caad60d2 in ObjectCacher::FlusherThread::entry (this=0x1dcd730) at osdc/ObjectCacher.h:345 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0x1dcd730) at common/Thread.cc:41 #6 0x00007f53c8c04e9a in start_thread (arg=0x7f53bc9f3700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 16 (Thread 0x7f5355ef8700 (LWP 17346)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007f53caad4c4b in Cond::WaitUntil (this=0x13abd98, mutex=..., when=...) at ./common/Cond.h:71 #2 0x00007f53caad4cb2 in Cond::WaitInterval (this=0x13abd98, cct=0xe9a980, mutex=..., interval=...) at ./common/Cond.h:79 #3 0x00007f53caace52d in ObjectCacher::flusher_entry (this=0x13abbd0) at osdc/ObjectCacher.cc:1368 #4 0x00007f53caad60d2 in ObjectCacher::FlusherThread::entry (this=0x13abde0) at osdc/ObjectCacher.h:345 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0x13abde0) at common/Thread.cc:41 #6 0x00007f53c8c04e9a in start_thread (arg=0x7f5355ef8700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 15 (Thread 0x7f53c41a1700 (LWP 17872)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f53caa84dbc in Cond::Wait (this=0x7f53b42aedc0, mutex=...) at ./common/Cond.h:55 #2 0x00007f53c9db4a6c in Pipe::writer (this=0x7f53b42aebe0) at msg/Pipe.cc:1527 #3 0x00007f53c9db9cc2 in Pipe::Writer::entry (this=0x7f53b42aec08) at msg/Pipe.h:59 #4 0x00007f53c9e3011d in Thread::_entry_func (arg=0x7f53b42aec08) at common/Thread.cc:41 #5 0x00007f53c8c04e9a in start_thread (arg=0x7f53c41a1700) at pthread_create.c:308 #6 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 14 (Thread 0x7f53c43a3700 (LWP 17873)): #0 0x00007f53c8f00b03 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f53c9db8cbb in Pipe::tcp_read_wait (this=0x7f53b42aebe0) at msg/Pipe.cc:2048 #2 0x00007f53c9db8bce in Pipe::tcp_read (this=0x7f53b42aebe0, buf=0x7f53c43a2d8f "\377", len=1) at msg/Pipe.cc:2021 #3 0x00007f53c9db2024 in Pipe::reader (this=0x7f53b42aebe0) at msg/Pipe.cc:1276 #4 0x00007f53c9db9c62 in Pipe::Reader::entry (this=0x7f53b42aebf0) at msg/Pipe.h:47 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0x7f53b42aebf0) at common/Thread.cc:41 #6 0x00007f53c8c04e9a in start_thread (arg=0x7f53c43a3700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 13 (Thread 0x7f53c44a4700 (LWP 18127)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f53caa84dbc in Cond::Wait (this=0x7f53b46aa660, mutex=...) at ./common/Cond.h:55 #2 0x00007f53c9db4a6c in Pipe::writer (this=0x7f53b46aa480) at msg/Pipe.cc:1527 #3 0x00007f53c9db9cc2 in Pipe::Writer::entry (this=0x7f53b46aa4a8) at msg/Pipe.h:59 #4 0x00007f53c9e3011d in Thread::_entry_func (arg=0x7f53b46aa4a8) at common/Thread.cc:41 #5 0x00007f53c8c04e9a in start_thread (arg=0x7f53c44a4700) at pthread_create.c:308 #6 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 12 (Thread 0x7f53bddf9700 (LWP 18128)): #0 0x00007f53c8f00b03 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f53c9db8cbb in Pipe::tcp_read_wait (this=0x7f53b46aa480) at msg/Pipe.cc:2048 #2 0x00007f53c9db8bce in Pipe::tcp_read (this=0x7f53b46aa480, buf=0x7f53bddf8d8f "\377", len=1) at msg/Pipe.cc:2021 #3 0x00007f53c9db2024 in Pipe::reader (this=0x7f53b46aa480) at msg/Pipe.cc:1276 #4 0x00007f53c9db9c62 in Pipe::Reader::entry (this=0x7f53b46aa490) at msg/Pipe.h:47 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0x7f53b46aa490) at common/Thread.cc:41 ---Type <return> to continue, or q <return> to quit--- #6 0x00007f53c8c04e9a in start_thread (arg=0x7f53bddf9700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 11 (Thread 0x7f53bdbf7700 (LWP 18268)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f53caa84dbc in Cond::Wait (this=0x7f53b49c1de0, mutex=...) at ./common/Cond.h:55 #2 0x00007f53c9db4a6c in Pipe::writer (this=0x7f53b49c1c00) at msg/Pipe.cc:1527 #3 0x00007f53c9db9cc2 in Pipe::Writer::entry (this=0x7f53b49c1c28) at msg/Pipe.h:59 #4 0x00007f53c9e3011d in Thread::_entry_func (arg=0x7f53b49c1c28) at common/Thread.cc:41 #5 0x00007f53c8c04e9a in start_thread (arg=0x7f53bdbf7700) at pthread_create.c:308 #6 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 10 (Thread 0x7f53c42a2700 (LWP 18269)): #0 0x00007f53c8f00b03 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f53c9db8cbb in Pipe::tcp_read_wait (this=0x7f53b49c1c00) at msg/Pipe.cc:2048 #2 0x00007f53c9db8bce in Pipe::tcp_read (this=0x7f53b49c1c00, buf=0x7f53c42a1d8f "\377", len=1) at msg/Pipe.cc:2021 #3 0x00007f53c9db2024 in Pipe::reader (this=0x7f53b49c1c00) at msg/Pipe.cc:1276 #4 0x00007f53c9db9c62 in Pipe::Reader::entry (this=0x7f53b49c1c10) at msg/Pipe.h:47 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0x7f53b49c1c10) at common/Thread.cc:41 #6 0x00007f53c8c04e9a in start_thread (arg=0x7f53c42a2700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 9 (Thread 0x7f53be7fc700 (LWP 18308)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f53caa84dbc in Cond::Wait (this=0x7f53b4a1e5c0, mutex=...) at ./common/Cond.h:55 #2 0x00007f53c9db4a6c in Pipe::writer (this=0x7f53b4a1e3e0) at msg/Pipe.cc:1527 #3 0x00007f53c9db9cc2 in Pipe::Writer::entry (this=0x7f53b4a1e408) at msg/Pipe.h:59 #4 0x00007f53c9e3011d in Thread::_entry_func (arg=0x7f53b4a1e408) at common/Thread.cc:41 #5 0x00007f53c8c04e9a in start_thread (arg=0x7f53be7fc700) at pthread_create.c:308 #6 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 8 (Thread 0x7f53577fc700 (LWP 18309)): #0 0x00007f53c8f00b03 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f53c9db8cbb in Pipe::tcp_read_wait (this=0x7f53b4a1e3e0) at msg/Pipe.cc:2048 #2 0x00007f53c9db8bce in Pipe::tcp_read (this=0x7f53b4a1e3e0, buf=0x7f53577fbd8f "\377", len=1) at msg/Pipe.cc:2021 #3 0x00007f53c9db2024 in Pipe::reader (this=0x7f53b4a1e3e0) at msg/Pipe.cc:1276 #4 0x00007f53c9db9c62 in Pipe::Reader::entry (this=0x7f53b4a1e3f0) at msg/Pipe.h:47 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0x7f53b4a1e3f0) at common/Thread.cc:41 #6 0x00007f53c8c04e9a in start_thread (arg=0x7f53577fc700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 7 (Thread 0x7f53caf32700 (LWP 18310)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f53caa84dbc in Cond::Wait (this=0x7f53b48a24d0, mutex=...) at ./common/Cond.h:55 #2 0x00007f53c9db4a6c in Pipe::writer (this=0x7f53b48a22f0) at msg/Pipe.cc:1527 #3 0x00007f53c9db9cc2 in Pipe::Writer::entry (this=0x7f53b48a2318) at msg/Pipe.h:59 #4 0x00007f53c9e3011d in Thread::_entry_func (arg=0x7f53b48a2318) at common/Thread.cc:41 #5 0x00007f53c8c04e9a in start_thread (arg=0x7f53caf32700) at pthread_create.c:308 #6 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 6 (Thread 0x7f534ffff700 (LWP 18311)): #0 0x00007f53c8f00b03 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f53c9db8cbb in Pipe::tcp_read_wait (this=0x7f53b48a22f0) at msg/Pipe.cc:2048 #2 0x00007f53c9db8bce in Pipe::tcp_read (this=0x7f53b48a22f0, buf=0x7f534fffed8f "\377", len=1) at msg/Pipe.cc:2021 #3 0x00007f53c9db2024 in Pipe::reader (this=0x7f53b48a22f0) at msg/Pipe.cc:1276 ---Type <return> to continue, or q <return> to quit--- #4 0x00007f53c9db9c62 in Pipe::Reader::entry (this=0x7f53b48a2300) at msg/Pipe.h:47 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0x7f53b48a2300) at common/Thread.cc:41 #6 0x00007f53c8c04e9a in start_thread (arg=0x7f534ffff700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 5 (Thread 0x7f53c45a5700 (LWP 18318)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f53caa84dbc in Cond::Wait (this=0x7f53b5638660, mutex=...) at ./common/Cond.h:55 #2 0x00007f53c9db4a6c in Pipe::writer (this=0x7f53b5638480) at msg/Pipe.cc:1527 #3 0x00007f53c9db9cc2 in Pipe::Writer::entry (this=0x7f53b56384a8) at msg/Pipe.h:59 #4 0x00007f53c9e3011d in Thread::_entry_func (arg=0x7f53b56384a8) at common/Thread.cc:41 #5 0x00007f53c8c04e9a in start_thread (arg=0x7f53c45a5700) at pthread_create.c:308 #6 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 4 (Thread 0x7f52bbffd700 (LWP 18319)): #0 0x00007f53c8f00b03 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f53c9db8cbb in Pipe::tcp_read_wait (this=0x7f53b5638480) at msg/Pipe.cc:2048 #2 0x00007f53c9db8bce in Pipe::tcp_read (this=0x7f53b5638480, buf=0x7f52bbffcd8f "\377", len=1) at msg/Pipe.cc:2021 #3 0x00007f53c9db2024 in Pipe::reader (this=0x7f53b5638480) at msg/Pipe.cc:1276 #4 0x00007f53c9db9c62 in Pipe::Reader::entry (this=0x7f53b5638490) at msg/Pipe.h:47 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0x7f53b5638490) at common/Thread.cc:41 #6 0x00007f53c8c04e9a in start_thread (arg=0x7f52bbffd700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 3 (Thread 0x7f53bdcf8700 (LWP 18351)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f53caa84dbc in Cond::Wait (this=0x7f53b5136050, mutex=...) at ./common/Cond.h:55 #2 0x00007f53c9db4a6c in Pipe::writer (this=0x7f53b5135e70) at msg/Pipe.cc:1527 #3 0x00007f53c9db9cc2 in Pipe::Writer::entry (this=0x7f53b5135e98) at msg/Pipe.h:59 #4 0x00007f53c9e3011d in Thread::_entry_func (arg=0x7f53b5135e98) at common/Thread.cc:41 #5 0x00007f53c8c04e9a in start_thread (arg=0x7f53bdcf8700) at pthread_create.c:308 #6 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 2 (Thread 0x7f53bdaf6700 (LWP 18352)): #0 0x00007f53c8f00b03 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007f53c9db8cbb in Pipe::tcp_read_wait (this=0x7f53b5135e70) at msg/Pipe.cc:2048 #2 0x00007f53c9db8bce in Pipe::tcp_read (this=0x7f53b5135e70, buf=0x7f53bdaf5d8f "\377", len=1) at msg/Pipe.cc:2021 #3 0x00007f53c9db2024 in Pipe::reader (this=0x7f53b5135e70) at msg/Pipe.cc:1276 #4 0x00007f53c9db9c62 in Pipe::Reader::entry (this=0x7f53b5135e80) at msg/Pipe.h:47 #5 0x00007f53c9e3011d in Thread::_entry_func (arg=0x7f53b5135e80) at common/Thread.cc:41 #6 0x00007f53c8c04e9a in start_thread (arg=0x7f53bdaf6700) at pthread_create.c:308 #7 0x00007f53c8f0c4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #8 0x0000000000000000 in ?? () Thread 1 (Thread 0x7f53caf3a780 (LWP 17125)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007f53caa84dbc in Cond::Wait (this=0x7fffa8305e10, mutex=...) at ./common/Cond.h:55 #2 0x00007f53caab5524 in librbd::read (ictx=0x11d9790, image_extents=..., buf=0x7f53546f7010 "", pbl=0x0) at librbd/internal.cc:2266 #3 0x00007f53caab32e4 in librbd::flatten (ictx=0x137e450, prog_ctx=...) at librbd/internal.cc:2009 #4 0x00007f53caa82bd2 in rbd_flatten (image=0x137e450) at librbd/librbd.cc:630 #5 0x00000000004054b6 in do_flatten () at test/librbd/fsx.c:942 #6 0x00000000004058d1 in test () at test/librbd/fsx.c:1077 #7 0x00000000004031bc in main (argc=<optimized out>, argv=<optimized out>) at test/librbd/fsx.c:1551
ubuntu@teuthology:/a/teuthology-2013-01-22_20:00:04-regression-bobtail-master-basic/3410$
kernel: kdb: true sha1: 0fa6ebc600bc8e830551aee47a0e929e818a1868 nuke-on-error: true overrides: ceph: conf: client: rbd cache: true global: ms inject socket failures: 5000 fs: btrfs log-whitelist: - slow request sha1: c5fe0965572c074a2a33660719ce3222d18c1464 s3tests: branch: bobtail workunit: sha1: c5fe0965572c074a2a33660719ce3222d18c1464 roles: - - mon.a - osd.0 - osd.1 - osd.2 - - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rbd_fsx: clients: - client.0 ops: 2000
Updated by Josh Durgin about 11 years ago
- Status changed from In Progress to Fix Under Review
- Assignee changed from Josh Durgin to Sage Weil
wip-rbd-flatten-deadlock
Updated by Sage Weil about 11 years ago
- Status changed from Fix Under Review to Resolved
commit:e4fd70fcec3a9abb7f19517326e46f58031c4196
Actions