Project

General

Profile

Actions

Bug #62832

closed

common: config_proxy deadlock during shutdown (and possibly other times)

Added by Patrick Donnelly 8 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
High
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Q/A
Tags:
backport_processed
Backport:
reef,quincy
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Objecter, ceph cli, librados
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Saw this deadlock in teuthology where I was doing parallel `ceph config set` commands:


Thread 15 (Thread 0x7f8b127fc700 (LWP 132980)):
#0  0x00007f8b4460e44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f8b3c8bc8f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2  0x00007f8b3f3b2034 in std::_V2::condition_variable_any::wait<std::unique_lock<std::mutex> > (__lock=..., this=0x7f8b2c064a70)
    at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/unique_lock.h:110
#3  CommonSafeTimer<std::mutex>::timer_thread (this=0x7f8b2c064a58) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/common/Timer.cc:125
#4  0x00007f8b3f3b3271 in CommonSafeTimerThread<std::mutex>::entry (this=<optimized out>) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/common/Timer.cc:32
#5  0x00007f8b446081cf in start_thread () from /lib64/libpthread.so.0
#6  0x00007f8b43639d83 in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f8b12ffd700 (LWP 132979)):
#0  0x00007f8b4461181d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8b4460ab94 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f8b3f6bb214 in __gthread_mutex_lock (__mutex=0x7f8b2c004f10) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/x86_64-redhat-linux/bits/gthr-default.h:749
#3  __gthread_recursive_mutex_lock (__mutex=0x7f8b2c004f10) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/x86_64-redhat-linux/bits/gthr-default.h:811
#4  std::recursive_mutex::lock (this=0x7f8b2c004f10) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/mutex:108
#5  std::lock_guard<std::recursive_mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_mutex.h:229
#6  ceph::common::ConfigProxy::get_val<double> (key="mon_client_hunt_interval_backoff", this=0x7f8b2c001558)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/common/config_proxy.h:142
#7  MonClient::_un_backoff (this=0x7f8b2c064168) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/mon/MonClient.cc:1018
#8  0x00007f8b3f6c8b81 in MonClient::tick (this=0x7f8b2c064168) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/mon/MonClient.cc:1002
#9  0x00007f8b3f33caed in Context::complete (this=0x7f8b2c0ac630, r=<optimized out>) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/include/Context.h:99
#10 0x00007f8b3f3b1f4f in CommonSafeTimer<std::mutex>::timer_thread (this=0x7f8b2c064558) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/common/Timer.cc:108
#11 0x00007f8b3f3b3271 in CommonSafeTimerThread<std::mutex>::entry (this=<optimized out>) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/common/Timer.cc:32
#12 0x00007f8b446081cf in start_thread () from /lib64/libpthread.so.0
#13 0x00007f8b43639d83 in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f8b137fe700 (LWP 132978)):
#0  0x00007f8b4460e44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f8b3c8bc8f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2  0x00007f8b3f529539 in DispatchQueue::run_local_delivery (this=0x7f8b2c155f80) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/msg/DispatchQueue.cc:119
#3  0x00007f8b3f5f42b1 in DispatchQueue::LocalDeliveryThread::entry (this=<optimized out>) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/msg/DispatchQueue.h:115
#4  0x00007f8b446081cf in start_thread () from /lib64/libpthread.so.0
#5  0x00007f8b43639d83 in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f8b13fff700 (LWP 132977)):
#0  0x00007f8b4461181d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8b4460aac9 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f8b3f6c4b19 in __gthread_mutex_lock (__mutex=0x7f8b2c064530) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/x86_64-redhat-linux/bits/gthr-default.h:749
#3  std::mutex::lock (this=0x7f8b2c064530) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_mutex.h:100
#4  std::lock_guard<std::mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_mutex.h:229
#5  MonClient::ms_handle_reset (this=0x7f8b2c064168, con=0x7f8b18003be0) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/mon/MonClient.cc:856
#6  0x00007f8b3f529836 in Messenger::ms_deliver_handle_reset (con=0x7f8b18003be0, this=<optimized out>) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/msg/Messenger.h:783
#7  DispatchQueue::entry (this=0x7f8b2c155f80) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/msg/DispatchQueue.cc:187
#8  0x00007f8b3f5f41f1 in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/msg/DispatchQueue.h:101
#9  0x00007f8b446081cf in start_thread () from /lib64/libpthread.so.0
#10 0x00007f8b43639d83 in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f8b28ff9700 (LWP 132976)):
#0  0x00007f8b4460e44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f8b3c8bc8f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2  0x00007f8b445536a1 in ceph::timer<ceph::coarse_mono_clock>::timer_thread (this=0x7f8b2c1511a0)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/intrusive/detail/rbtree_node.hpp:77
#3  0x00007f8b3c8c2ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#4  0x00007f8b446081cf in start_thread () from /lib64/libpthread.so.0
#5  0x00007f8b43639d83 in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f8b317fe700 (LWP 132975)):
#0  0x00007f8b43724ae1 in poll () from /lib64/libc.so.6
#1  0x00007f8b3f3ca4d7 in poll (__timeout=-1, __nfds=2, __fds=0x7f8b317fde60) at /usr/include/bits/poll2.h:38
#2  AdminSocket::entry (this=0x7f8b2c060800) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/common/admin_socket.cc:254
#3  0x00007f8b3c8c2ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#4  0x00007f8b446081cf in start_thread () from /lib64/libpthread.so.0
#5  0x00007f8b43639d83 in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f8b297fa700 (LWP 132974)):
#0  0x00007f8b4460e44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f8b444db0da in boost::asio::detail::scheduler::do_run_one (this=0x7f8b2c04f040, lock=..., this_thread=..., ec=...)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/detail/conditionally_enabled_mutex.hpp:98
#2  0x00007f8b444c9401 in boost::asio::detail::scheduler::run(boost::system::error_code&) [clone .isra.0] ()
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/detail/impl/scheduler.ipp:210
#3  0x00007f8b444dfb5f in std::thread::_State_impl<std::thread::_Invoker<std::tuple<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1, auto:2&&)#1}, {lambda()#1}> > >::_M_run() (this=0x7f8b2c153590)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/impl/io_context.ipp:64
#4  0x00007f8b3c8c2ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#5  0x00007f8b446081cf in start_thread () from /lib64/libpthread.so.0
#6  0x00007f8b43639d83 in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f8b29ffb700 (LWP 132973)):
#0  0x00007f8b4461181d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8b4460ab94 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f8b444f7119 in __gthread_mutex_lock (__mutex=0x7f8b2c004f10) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/x86_64-redhat-linux/bits/gthr-default.h:749
#3  __gthread_recursive_mutex_lock (__mutex=0x7f8b2c004f10) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/x86_64-redhat-linux/bits/gthr-default.h:811
#4  std::recursive_mutex::lock (this=0x7f8b2c004f10) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/mutex:108
#5  std::lock_guard<std::recursive_mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_mutex.h:229
#6  ceph::common::ConfigProxy::get_val<std::chrono::duration<long, std::ratio<1l, 1l> > > (key="rados_mon_op_timeout", this=0x7f8b2c001558)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/common/config_proxy.h:142
#7  Objecter::handle_conf_change (this=0x7f8b2c151010, conf=..., changed=std::set with 1 element = {...}) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/osdc/Objecter.cc:235
#8  0x00007f8b3f6b7993 in ceph::common::ConfigProxy::call_observers (rev_obs=std::map with 2 elements = {...}, locker=<synthetic pointer>..., this=0x7f8b2c001558)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/common/config_proxy.h:85
#9  ceph::common::ConfigProxy::set_mon_vals(ceph::common::CephContext*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>) (this=0x7f8b2c001558, cct=<optimized out>, kv=..., config_cb=...)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/common/config_proxy.h:293
#10 0x00007f8b3f6b7da1 in operator() (__closure=0x7f8b29ffabe0)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/smart_ptr/intrusive_ptr.hpp:181
#11 boost::asio::detail::binder0<MonClient::handle_config(MConfig*)::<lambda()> >::operator() (this=0x7f8b29ffabe0)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/detail/bind_handler.hpp:60
#12 boost::asio::asio_handler_invoke<boost::asio::detail::binder0<MonClient::handle_config(MConfig*)::<lambda()> > > (function=...)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/handler_invoke_hook.hpp:88
#13 boost_asio_handler_invoke_helpers::invoke<boost::asio::detail::binder0<MonClient::handle_config(MConfig*)::<lambda()> >, MonClient::handle_config(MConfig*)::<lambda()> > (context=..., 
    function=...) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/detail/handler_invoke_helpers.hpp:54
#14 boost::asio::detail::asio_handler_invoke<boost::asio::detail::binder0<MonClient::handle_config(MConfig*)::<lambda()> >, MonClient::handle_config(MConfig*)::<lambda()> > (
    this_handler=0x7f8b29ffabe0, function=...) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/detail/bind_handler.hpp:111
#15 boost_asio_handler_invoke_helpers::invoke<boost::asio::detail::binder0<MonClient::handle_config(MConfig*)::<lambda()> >, boost::asio::detail::binder0<MonClient::handle_config(MConfig*)::<lambda()> > > (context=..., function=...) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/detail/handler_invoke_helpers.hpp:54
#16 boost::asio::detail::handler_work<boost::asio::detail::binder0<MonClient::handle_config(MConfig*)::<lambda()> >, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0>, void>::complete<boost::asio::detail::binder0<MonClient::handle_config(MConfig*)::<lambda()> > > (handler=..., function=..., this=<synthetic pointer>)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/detail/handler_work.hpp:524
#17 boost::asio::detail::completion_handler<boost::asio::detail::binder0<MonClient::handle_config(MConfig*)::<lambda()> >, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> >::do_complete(void *, boost::asio::detail::operation *, const boost::system::error_code &, std::size_t) (owner=0x7f8b2c04f040, base=<optimized out>)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/detail/completion_handler.hpp:74
#18 0x00007f8b3f6cb235 in boost::asio::detail::scheduler_operation::complete (bytes_transferred=0, ec=..., owner=0x7f8b2c04f040, this=<optimized out>)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/detail/scheduler_operation.hpp:40
#19 boost::asio::detail::strand_service::do_complete (owner=0x7f8b2c04f040, base=0x7f8b2c065750, ec=...)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/detail/impl/strand_service.ipp:193
#20 0x00007f8b444db1ca in boost::asio::detail::scheduler::do_run_one (this=0x7f8b2c04f040, lock=..., this_thread=..., ec=...)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/detail/scheduler_operation.hpp:40
#21 0x00007f8b444c9401 in boost::asio::detail::scheduler::run(boost::system::error_code&) [clone .isra.0] ()
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/detail/impl/scheduler.ipp:210
#22 0x00007f8b444dfb5f in std::thread::_State_impl<std::thread::_Invoker<std::tuple<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1, auto:2&&)#1}, {lambda()#1}> > >::_M_run() (this=0x7f8b18008460)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/boost/include/boost/asio/impl/io_context.ipp:64
#23 0x00007f8b3c8c2ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#24 0x00007f8b446081cf in start_thread () from /lib64/libpthread.so.0
#25 0x00007f8b43639d83 in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f8b2a7fc700 (LWP 132972)):
#0  0x00007f8b4460e79a in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f8b3f3e359d in ceph::common::CephContextServiceThread::entry (this=0x7f8b24004cb0) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/x86_64-redhat-linux/bits/gthr-default.h:872
#2  0x00007f8b446081cf in start_thread () from /lib64/libpthread.so.0
#3  0x00007f8b43639d83 in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f8b2affd700 (LWP 131014)):
#0  0x00007f8b4372fa27 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f8b3f6530f4 in EpollDriver::event_wait (this=0x7f8b2c0c8710, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/msg/async/EventEpoll.cc:123
#2  0x00007f8b3f63fe83 in EventCenter::process_events (this=this@entry=0x7f8b2c073798, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, 
    working_dur=working_dur@entry=0x7f8b2affcdc8) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/msg/async/Event.cc:416
#3  0x00007f8b3f6489a6 in operator() (__closure=<optimized out>) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/msg/async/Stack.cc:50
#4  std::__invoke_impl<void, NetworkStack::add_thread(Worker*)::<lambda()>&> (__f=...) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61
#5  std::__invoke_r<void, NetworkStack::add_thread(Worker*)::<lambda()>&> (__fn=...) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111
#6  std::_Function_handler<void(), NetworkStack::add_thread(Worker*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290
#7  0x00007f8b3c8c2ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#8  0x00007f8b446081cf in start_thread () from /lib64/libpthread.so.0
#9  0x00007f8b43639d83 in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f8b2b7fe700 (LWP 131013)):
#0  0x00007f8b4372fa27 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f8b3f6530f4 in EpollDriver::event_wait (this=0x7f8b2c0aaf30, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/msg/async/EventEpoll.cc:123
#2  0x00007f8b3f63fe83 in EventCenter::process_events (this=this@entry=0x7f8b2c0c2a58, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, 
    working_dur=working_dur@entry=0x7f8b2b7fddc8) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/msg/async/Event.cc:416
#3  0x00007f8b3f6489a6 in operator() (__closure=<optimized out>) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/msg/async/Stack.cc:50
#4  std::__invoke_impl<void, NetworkStack::add_thread(Worker*)::<lambda()>&> (__f=...) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61
#5  std::__invoke_r<void, NetworkStack::add_thread(Worker*)::<lambda()>&> (__fn=...) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111
#6  std::_Function_handler<void(), NetworkStack::add_thread(Worker*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290
#7  0x00007f8b3c8c2ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#8  0x00007f8b446081cf in start_thread () from /lib64/libpthread.so.0
#9  0x00007f8b43639d83 in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f8b2bfff700 (LWP 131012)):
#0  0x00007f8b4372fa27 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f8b3f6530f4 in EpollDriver::event_wait (this=0x7f8b2c063cf0, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/msg/async/EventEpoll.cc:123
#2  0x00007f8b3f63fe83 in EventCenter::process_events (this=this@entry=0x7f8b2c072618, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, 
    working_dur=working_dur@entry=0x7f8b2bffedc8) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/msg/async/Event.cc:416
#3  0x00007f8b3f6489a6 in operator() (__closure=<optimized out>) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/msg/async/Stack.cc:50
#4  std::__invoke_impl<void, NetworkStack::add_thread(Worker*)::<lambda()>&> (__f=...) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:61
#5  std::__invoke_r<void, NetworkStack::add_thread(Worker*)::<lambda()>&> (__fn=...) at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/invoke.h:111
#6  std::_Function_handler<void(), NetworkStack::add_thread(Worker*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/std_function.h:290
#7  0x00007f8b3c8c2ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#8  0x00007f8b446081cf in start_thread () from /lib64/libpthread.so.0
#9  0x00007f8b43639d83 in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f8b30d9d700 (LWP 131009)):
#0  0x00007f8b4460e44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f8b3c8bc8f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2  0x00007f8b3f6914d0 in ceph::logging::Log::entry (this=0x7f8b2c04ff20) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/log/Log.cc:578
#3  0x00007f8b446081cf in start_thread () from /lib64/libpthread.so.0
#4  0x00007f8b43639d83 in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f8b45a01b80 (LWP 130848)):
#0  0x00007f8b44610fb2 in do_futex_wait () from /lib64/libpthread.so.0
#1  0x00007f8b446110c3 in __new_sem_wait_slow () from /lib64/libpthread.so.0
#2  0x00007f8b44af8790 in PyThread_acquire_lock_timed () from /lib64/libpython3.6m.so.1.0
#3  0x00007f8b44b783c5 in lock_PyThread_acquire_lock () from /lib64/libpython3.6m.so.1.0
#4  0x00007f8b44b9d401 in call_function () from /lib64/libpython3.6m.so.1.0
#5  0x00007f8b44b9ddc8 in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
#6  0x00007f8b44af99d4 in _PyEval_EvalCodeWithName () from /lib64/libpython3.6m.so.1.0
#7  0x00007f8b44b7adf0 in fast_function () from /lib64/libpython3.6m.so.1.0
#8  0x00007f8b44b9d187 in call_function () from /lib64/libpython3.6m.so.1.0
#9  0x00007f8b44b9ea04 in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
#10 0x00007f8b44af99d4 in _PyEval_EvalCodeWithName () from /lib64/libpython3.6m.so.1.0
#11 0x00007f8b44b7adf0 in fast_function () from /lib64/libpython3.6m.so.1.0
#12 0x00007f8b44b9d187 in call_function () from /lib64/libpython3.6m.so.1.0
#13 0x00007f8b44b9ea04 in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
#14 0x00007f8b44af99d4 in _PyEval_EvalCodeWithName () from /lib64/libpython3.6m.so.1.0
#15 0x00007f8b44b7adf0 in fast_function () from /lib64/libpython3.6m.so.1.0
#16 0x00007f8b44b9d187 in call_function () from /lib64/libpython3.6m.so.1.0
#17 0x00007f8b44b9ddc8 in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
#18 0x00007f8b44af99d4 in _PyEval_EvalCodeWithName () from /lib64/libpython3.6m.so.1.0
#19 0x00007f8b44afad73 in PyEval_EvalCode () from /lib64/libpython3.6m.so.1.0
#20 0x00007f8b44c096a2 in run_mod () from /lib64/libpython3.6m.so.1.0
#21 0x00007f8b44ada374 in PyRun_FileExFlags () from /lib64/libpython3.6m.so.1.0
#22 0x00007f8b44adf471 in PyRun_SimpleFileExFlags () from /lib64/libpython3.6m.so.1.0
#23 0x00007f8b44adfdc0 in Py_Main.cold.3365 () from /lib64/libpython3.6m.so.1.0
#24 0x0000562b84200b96 in main ()

Thread 1 (Thread 0x7f8b31fff700 (LWP 132985)):
#0  0x00007f8b4460e44c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f8b3c8bc8f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2  0x00007f8b4450551b in ceph::common::ConfigProxy::CallGate::close (this=0x7f8b2c0c7100) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/common/config_proxy.h:54
#3  ceph::common::ConfigProxy::call_gate_close (obs=0x7f8b2c151010, this=0x7f8b2c001558) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/common/config_proxy.h:72
#4  ceph::common::ConfigProxy::remove_observer (obs=0x7f8b2c151010, this=0x7f8b2c001558) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/common/config_proxy.h:213
#5  Objecter::shutdown (this=0x7f8b2c151010) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/osdc/Objecter.cc:429
#6  0x00007f8b444d288a in librados::v14_2_0::RadosClient::shutdown() () at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/librados/RadosClient.cc:362
#7  0x00007f8b4444d712 in _rados_shutdown (cluster=0x7f8b2c0640b0) at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/src/librados/librados_c.cc:231
#8  0x00007f8b448f8011 in __pyx_pf_5rados_5Rados_8shutdown (__pyx_v_self=<optimized out>)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/src/pybind/rados/rados.c:13937
#9  __pyx_pw_5rados_5Rados_9shutdown (__pyx_v_self=0x7f8b441e1938, unused=<optimized out>)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/src/pybind/rados/rados.c:13892
#10 0x00007f8b448f18bc in __Pyx_CyFunction_CallAsMethod (kw=<optimized out>, args=<optimized out>, func=<optimized out>)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/src/pybind/rados/rados.c:92397
#11 __Pyx_CyFunction_CallAsMethod (func=0x7f8b4487df60, args=<optimized out>, kw=0x7f8b434d5e58)
    at /usr/src/debug/ceph-18.0.0-6088.g2110e007.el8.x86_64/x86_64-redhat-linux-gnu/src/pybind/rados/rados.c:26845
#12 0x00007f8b44afbc9c in _PyObject_FastCallDict () from /lib64/libpython3.6m.so.1.0
#13 0x00007f8b44b0e040 in method_call () from /lib64/libpython3.6m.so.1.0
#14 0x00007f8b44b02a5b in PyObject_Call () from /lib64/libpython3.6m.so.1.0
#15 0x00007f8b44b9fc30 in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
#16 0x00007f8b44b7ac08 in fast_function () from /lib64/libpython3.6m.so.1.0
#17 0x00007f8b44b9d187 in call_function () from /lib64/libpython3.6m.so.1.0
#18 0x00007f8b44b9ddc8 in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
#19 0x00007f8b44b7ac08 in fast_function () from /lib64/libpython3.6m.so.1.0
#20 0x00007f8b44b9d187 in call_function () from /lib64/libpython3.6m.so.1.0
#21 0x00007f8b44b9ddc8 in _PyEval_EvalFrameDefault () from /lib64/libpython3.6m.so.1.0
#22 0x00007f8b44afaee2 in _PyFunction_FastCallDict () from /lib64/libpython3.6m.so.1.0
#23 0x00007f8b44afbcbe in _PyObject_FastCallDict () from /lib64/libpython3.6m.so.1.0
#24 0x00007f8b44b0e040 in method_call () from /lib64/libpython3.6m.so.1.0
#25 0x00007f8b44b02a5b in PyObject_Call () from /lib64/libpython3.6m.so.1.0
#26 0x00007f8b44c0dd72 in t_bootstrap () from /lib64/libpython3.6m.so.1.0
#27 0x00007f8b44bb41b4 in pythread_wrapper () from /lib64/libpython3.6m.so.1.0
#28 0x00007f8b446081cf in start_thread () from /lib64/libpthread.so.0
#29 0x00007f8b43639d83 in clone () from /lib64/libc.so.6

Manually collected when this job was running:

/teuthology/pdonnell-2023-09-12_14:07:50-fs-wip-batrick-testing-20230912.122437-distro-default-smithi/7395154


Related issues 4 (0 open4 closed)

Related to CephFS - Bug #24823: mds: deadlock when setting config value via admin socketResolvedVenky Shankar

Actions
Related to rbd - Bug #41354: RBD image manipulation using python API crashing since NautilusResolvedJason Dillaman

Actions
Copied to RADOS - Backport #63456: reef: common: config_proxy deadlock during shutdown (and possibly other times)ResolvedPatrick DonnellyActions
Copied to RADOS - Backport #63457: quincy: common: config_proxy deadlock during shutdown (and possibly other times)ResolvedPatrick DonnellyActions
Actions #1

Updated by Patrick Donnelly 8 months ago

  • Related to Bug #24823: mds: deadlock when setting config value via admin socket added
Actions #2

Updated by Patrick Donnelly 8 months ago

  • Related to Bug #41354: RBD image manipulation using python API crashing since Nautilus added
Actions #3

Updated by Patrick Donnelly 8 months ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 62832
Actions #4

Updated by Radoslaw Zarzynski 8 months ago

Oops, the PR links somehow leads to 404. A Github issue?

Actions #5

Updated by Patrick Donnelly 7 months ago

  • Pull request ID changed from 62832 to 53568

Radoslaw Zarzynski wrote:

Oops, the PR links somehow leads to 404. A Github issue?

Fixed!

Actions #6

Updated by Patrick Donnelly 6 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Patrick Donnelly 6 months ago

  • Backport changed from reef,quincy,pacific to reef,quincy
Actions #8

Updated by Backport Bot 6 months ago

  • Copied to Backport #63456: reef: common: config_proxy deadlock during shutdown (and possibly other times) added
Actions #9

Updated by Backport Bot 6 months ago

  • Copied to Backport #63457: quincy: common: config_proxy deadlock during shutdown (and possibly other times) added
Actions #10

Updated by Backport Bot 6 months ago

  • Tags set to backport_processed
Actions #11

Updated by Patrick Donnelly about 2 months ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF