Project

General

Profile

Actions

Bug #52235

closed

[pwl] deadlock on AbstractWriteLog::m_lock during shutdown

Added by Ilya Dryomov over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A hang in KernelDevice::_aio_stop() waiting for bstore_aio thread to terminate:

Thread 92 (Thread 0x7f1b03fff700 (LWP 1768918)):
#0  __pthread_clockjoin_ex (threadid=139753982772992, thread_return=thread_return@entry=0x0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, block=block@entry=true) at pthread_join_common.c:145
#1  0x00007f1ba52099f3 in __pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0x0) at pthread_join.c:24
#2  0x00007f1ba419de00 in Thread::join (this=this@entry=0x55b79673da30, prval=prval@entry=0x0) at ../src/common/Thread.cc:176
#3  0x00007f1ba4e17a95 in KernelDevice::_aio_stop (this=this@entry=0x55b79673d780) at ../src/blk/kernel/KernelDevice.cc:496
#4  0x00007f1ba4e17e39 in KernelDevice::close (this=0x55b79673d780) at ../src/blk/kernel/KernelDevice.cc:291
#5  0x000055b7942f7ad0 in librbd::cache::pwl::ssd::WriteLog<librbd::MockImageCtx>::remove_pool_file (this=0x7fff07c76f50) at ../src/librbd/cache/pwl/ssd/WriteLog.cc:188
#6  0x000055b7942cf32b in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::shut_down(Context*)::{lambda(int)#4}::operator()(int) const (r=<optimized out>, this=<optimized out>) at /usr/include/c++/9/bits/stl_list.h:153
#7  LambdaContext<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::shut_down(Context*)::{lambda(int)#4}>::finish(int) (this=0x55b796742f70, r=<optimized out>) at ../src/include/Context.h:166
#8  0x000055b793394a7f in Context::complete (this=0x55b796742f70, r=<optimized out>) at ../src/include/Context.h:99
#9  0x000055b7942b58e7 in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::flush_dirty_entries (this=0x7fff07c76f50, on_finish=on_finish@entry=0x55b796742f70) at /usr/include/c++/9/bits/char_traits.h:335
#10 0x000055b7942b5e5d in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::shut_down(Context*)::{lambda(int)#5}::operator()(int) const (r=<optimized out>, this=<optimized out>) at ../src/common/RWLock.h:101
#11 LambdaContext<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::shut_down(Context*)::{lambda(int)#5}>::finish(int) (this=0x55b7967413c0, r=<optimized out>) at ../src/include/Context.h:166
#12 0x000055b793394a7f in Context::complete (this=0x55b7967413c0, r=<optimized out>) at ../src/include/Context.h:99
#13 0x000055b7942e813d in librbd::cache::pwl::TestMockCacheSSDWriteLog::expect_op_work_queue(librbd::MockImageCtx&)::{lambda(Context*, int)#1}::operator()(Context*, int) const (r=<optimized out>, ctx=<optimized out>, __closure=<optimized out>) at ../src/test/librbd/cache/pwl/test_mock_SSDWriteLog.cc:95
#14 std::_Function_handler<void (Context*, int), librbd::cache::pwl::TestMockCacheSSDWriteLog::expect_op_work_queue(librbd::MockImageCtx&)::{lambda(Context*, int)#1}>::_M_invoke(std::_Any_data const&, Context*&&, int&&) (__functor=..., __args#0=<optimized out>, __args#1=<optimized out>) at /usr/include/c++/9/bits/std_function.h:300
#15 0x000055b793462e19 in std::function<void (Context*, int)>::operator()(Context*, int) const (this=this@entry=0x7f1b03ffaa50, __args#0=<optimized out>, __args#1=<optimized out>) at /usr/include/c++/9/bits/std_function.h:263
#16 0x000055b793462e35 in testing::internal::ApplyImpl<std::function<void (Context*, int)> const&, std::tuple<Context*, int>, 0ul, 1ul>(std::function<void (Context*, int)> const&, std::tuple<Context*, int>&&, testing::internal::IndexSequence<0ul, 1ul>) (f=..., args=...) at ../src/googletest/googlemock/include/gmock/internal/gmock-internal-utils.h:405
#17 0x000055b793462e43 in testing::internal::Apply<std::function<void (Context*, int)> const&, std::tuple<Context*, int> >(std::function<void (Context*, int)> const&, std::tuple<Context*, int>&&) (f=..., args=...) at ../src/googletest/googlemock/include/gmock/internal/gmock-internal-utils.h:412
#18 0x000055b793463051 in testing::Action<void (Context*, int)>::Perform(std::tuple<Context*, int>) const (args=std::tuple containing = {...}, this=0x7f1b03ffaa50) at /usr/include/c++/9/bits/std_function.h:263
#19 testing::internal::ActionResultHolder<void>::PerformAction<void (Context*, int)>(testing::Action<void (Context*, int)> const&, testing::internal::Function<void (Context*, int)>::ArgumentTuple&&) (args=..., action=...) at ../src/googletest/googlemock/include/gmock/gmock-spec-builders.h:1447
#20 testing::internal::FunctionMocker<void (Context*, int)>::UntypedPerformAction(void const*, void*) const (this=<optimized out>, untyped_action=<optimized out>, untyped_args=0x7f1b03ffb010) at ../src/googletest/googlemock/include/gmock/gmock-spec-builders.h:1562
#21 0x000055b79492860b in testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith (this=this@entry=0x55b796768a30, untyped_args=<optimized out>, untyped_args@entry=0x7f1b03ffb010) at ../src/googletest/googlemock/src/gmock-spec-builders.cc:450
#22 0x000055b793399e1b in testing::internal::FunctionMocker<void (Context*, int)>::Invoke(Context*, int) (args#1=0, args#0=0x55b7967413c0, this=0x55b796768a30) at ../src/test/librbd/mock/MockContextWQ.h:14
#23 librbd::MockContextWQ::queue (gmock_a1=0, gmock_a0=0x55b7967413c0, this=0x55b796768a30) at ../src/test/librbd/mock/MockContextWQ.h:14
#24 librbd::util::detail::C_AsyncCallback<librbd::MockContextWQ>::finish (this=0x7f1b6c007a20, r=0) at ../src/librbd/Utils.h:135
#25 0x000055b793394a7f in Context::complete (this=0x7f1b6c007a20, r=<optimized out>) at ../src/include/Context.h:99
#26 0x00007f1ba4127c75 in AsyncOpTracker::wait_for_ops (this=0x7fff07c77378, on_finish=0x7f1b6c007a20) at ../src/common/AsyncOpTracker.cc:45
#27 0x000055b7942b5f1c in librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::shut_down(Context*)::{lambda(int)#6}::operator()(int) const (r=<optimized out>, this=<optimized out>) at ../src/common/config_proxy.h:120
#28 LambdaContext<librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::shut_down(Context*)::{lambda(int)#6}>::finish(int) (this=0x55b7967413e0, r=<optimized out>) at ../src/include/Context.h:166
#29 0x000055b793394a7f in Context::complete (this=0x55b7967413e0, r=<optimized out>) at ../src/include/Context.h:99
#30 0x000055b79351dd82 in ContextWQ::process (this=0x7fff07c77588, ctx=<optimized out>) at ../src/common/WorkQueue.h:562
#31 0x000055b7934f72fc in ThreadPool::PointerWQ<Context>::_void_process (this=<optimized out>, item=<optimized out>, handle=...) at ../src/common/WorkQueue.h:346
#32 0x00007f1ba41b6485 in ThreadPool::worker (this=0x7fff07c77138, wt=<optimized out>) at ../src/common/WorkQueue.cc:117
#33 0x00007f1ba41b6c57 in ThreadPool::WorkThread::entry (this=<optimized out>) at ../src/common/WorkQueue.h:401
#34 0x00007f1ba419da8f in Thread::entry_wrapper (this=0x55b79673e630) at ../src/common/Thread.cc:87
#35 0x00007f1ba419daa7 in Thread::_entry_func (arg=<optimized out>) at ../src/common/Thread.cc:74
#36 0x00007f1ba5208609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#37 0x00007f1ba36f3103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

AbstractWriteLog::m_lock is taken before AbstractWriteLog::remove_pool_file() is called and remains held across KernelDevice::_aio_stop().
But bstore_aio thread is itself stuck trying to acquire it from AbstractWriteLog::complete_op_log_entries():

Thread 89 (Thread 0x7f1b027fc700 (LWP 1768915)):
#0  __lll_lock_wait (futex=futex@entry=0x7fff07c77470, private=0) at lowlevellock.c:52
#1  0x00007f1ba520b0a3 in __GI___pthread_mutex_lock (mutex=0x7fff07c77470) at ../nptl/pthread_mutex_lock.c:80
#2  0x000055b7934c67d9 in __gthread_mutex_lock (__mutex=<optimized out>) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:749
#3  std::mutex::lock (this=<optimized out>) at /usr/include/c++/9/bits/std_mutex.h:100
#4  0x000055b7942a6d86 in std::lock_guard<std::mutex>::lock_guard (__m=..., this=<synthetic pointer>) at ../src/common/Clock.h:33
#5  librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::complete_op_log_entries (this=0x7fff07c76f50, ops=..., result=result@entry=0) at ../src/librbd/cache/pwl/AbstractWriteLog.cc:1322
#6  0x000055b7942f8dba in librbd::cache::pwl::ssd::WriteLog<librbd::MockImageCtx>::append_op_log_entries(std::__cxx11::list<std::shared_ptr<librbd::cache::pwl::GenericLogOperation>, std::allocator<std::shared_ptr<librbd::cache::pwl::GenericLogOperation> > >&)::{lambda(int)#2}::operator()(int) const (r=0, this=<optimized out>) at /usr/include/c++/9/ext/atomicity.h:98
#7  LambdaContext<librbd::cache::pwl::ssd::WriteLog<librbd::MockImageCtx>::append_op_log_entries(std::__cxx11::list<std::shared_ptr<librbd::cache::pwl::GenericLogOperation>, std::allocator<std::shared_ptr<librbd::cache::pwl::GenericLogOperation> > >&)::{lambda(int)#2}>::finish(int) (this=0x7f1af4000ef0, r=0) at ../src/include/Context.h:166
#8  0x000055b793394a7f in Context::complete (this=0x7f1af4000ef0, r=<optimized out>) at ../src/include/Context.h:99
#9  0x000055b7942fd609 in librbd::cache::pwl::ssd::WriteLog<librbd::MockImageCtx>::update_root_scheduled_ops()::{lambda(int)#4}::operator()(int) const (r=0, this=<optimized out>) at /usr/include/c++/9/bits/char_traits.h:335
#10 LambdaContext<librbd::cache::pwl::ssd::WriteLog<librbd::MockImageCtx>::update_root_scheduled_ops()::{lambda(int)#4}>::finish(int) (this=0x7f1af8004770, r=0) at ../src/include/Context.h:166
#11 0x000055b793394a7f in Context::complete (this=0x7f1af8004770, r=<optimized out>) at ../src/include/Context.h:99
#12 0x000055b7942fd6f4 in librbd::cache::pwl::ssd::WriteLog<librbd::MockImageCtx>::update_root_scheduled_ops()::{lambda(int)#5}::operator()(int) const (r=0, this=<optimized out>) at /usr/include/x86_64-linux-gnu/c++/9/bits/gthr-default.h:779
#13 LambdaContext<librbd::cache::pwl::ssd::WriteLog<librbd::MockImageCtx>::update_root_scheduled_ops()::{lambda(int)#5}>::finish(int) (this=0x7f1af0000f60, r=0) at ../src/include/Context.h:166
#14 0x000055b793394a7f in Context::complete (this=0x7f1af0000f60, r=<optimized out>) at ../src/include/Context.h:99
#15 0x000055b7942eb672 in librbd::cache::pwl::ssd::WriteLog<librbd::MockImageCtx>::AioTransContext::aio_finish (this=0x7f1af8005d70) at ../src/blk/BlockDevice.h:132
#16 librbd::cache::pwl::ssd::WriteLog<librbd::MockImageCtx>::aio_cache_cb (priv=<optimized out>, priv2=0x7f1af8005d70) at ../src/librbd/cache/pwl/ssd/WriteLog.h:141
#17 0x00007f1ba4e217ea in KernelDevice::_aio_thread (this=0x55b79673d780) at ../src/blk/kernel/KernelDevice.cc:629
#18 0x00007f1ba4e2c12b in KernelDevice::AioCompletionThread::entry (this=<optimized out>) at ../src/blk/kernel/KernelDevice.h:64
#19 0x00007f1ba419da8f in Thread::entry_wrapper (this=0x55b79673da30) at ../src/common/Thread.cc:87
#20 0x00007f1ba419daa7 in Thread::_entry_func (arg=<optimized out>) at ../src/common/Thread.cc:74
#21 0x00007f1ba5208609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#22 0x00007f1ba36f3103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  // New entries may be flushable
  {
    std::lock_guard locker(m_lock);
    wake_up();
  }

Related issues 2 (0 open2 closed)

Related to rbd - Bug #52566: [pwl ssd] assert in _aio_stop() during shutdownResolvedCONGMIN YIN

Actions
Copied to rbd - Backport #53170: pacific: [pwl] deadlock on AbstractWriteLog::m_lock during shutdownResolvedDeepika UpadhyayActions
Actions

Also available in: Atom PDF