Bug #24823
closedmds: deadlock when setting config value via admin socket
0%
Description
I hit the below deadlock while working on a PR which introduced a configuration option for MDS. The problem is with how MDS (MDSDaemon) handles configuration changes. With ongoing IO, its possible that MDS is trying to fetch a configured value (via get_val<>) under mds_lock (lock order: mds_lock -> md_config_impl::lock) concurrently with a "config set" (via admin socket) trying to update a config value (lock order: md_config_impl::lock -> mds_lock, note that MDSDaemon::handle_conf_change() tries to acquire mds_lock).
Ongoing IO:
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fa5aee77d46 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x55b7342d3590) at ../nptl/pthread_mutex_lock.c:113 #2 0x00007fa5b11128b9 in Mutex::Lock (this=0x55b7342d3568, no_lockdep=<optimized out>) at /work/ceph/src/common/Mutex.cc:107 #3 0x00007fa5b13c3399 in Mutex::Locker::Locker (m=..., this=<optimized out>) at /work/ceph/src/common/Mutex.h:116 #4 ceph::internal::LockMutex<(ceph::internal::LockPolicy)1>::operator() (this=<optimized out>) at /work/ceph/src/common/lock_mutex.h:37 #5 ceph::internal::md_config_impl<(ceph::internal::LockPolicy)1>::get_val_generic (this=this@entry=0x55b7342d0000, key="mds_cache_size") at /work/ceph/src/common/config.cc:1049 #6 0x000055b733153d8b in ceph::internal::md_config_impl<(ceph::internal::LockPolicy)1>::get_val<long> (this=this@entry=0x55b7342d0000, key="mds_cache_size") at /work/ceph/src/common/config.h:412 #7 0x000055b7332687af in MDCache::cache_limit_inodes () at /work/ceph/src/mds/MDCache.h:155 #8 MDCache::cache_toofull_ratio (this=<optimized out>) at /work/ceph/src/mds/MDCache.h:170 #9 0x000055b733221ae8 in MDCache::cache_toofull (this=0x55b734405000) at /work/ceph/src/mds/MDCache.h:176 #10 MDCache::add_inode (this=0x55b734405000, in=<optimized out>, in@entry=0x55b735718300) at /work/ceph/src/mds/MDCache.cc:287 #11 0x000055b73319cf65 in Server::prepare_new_inode (this=this@entry=0x55b73413ee60, mdr=..., dir=<optimized out>, useino=..., mode=mode@entry=33188, layout=layout@entry=0x7fa5a8f0f8a0) at /work/ceph/src/mds/Server.cc:2730 #12 0x000055b73319f38d in Server::handle_client_openc (this=0x55b73413ee60, mdr=...) at /work/ceph/src/mds/Server.cc:3684 #13 0x000055b7331a240d in Server::handle_client_request (this=this@entry=0x55b73413ee60, req=req@entry=0x55b734e250c0) at /work/ceph/src/mds/Server.cc:1773 #14 0x000055b7331a618b in Server::dispatch (this=0x55b73413ee60, m=m@entry=0x55b734e250c0) at /work/ceph/src/mds/Server.cc:270 #15 0x000055b73311eeb4 in MDSRank::handle_deferrable_message (this=this@entry=0x55b734404000, m=m@entry=0x55b734e250c0) at /work/ceph/src/mds/MDSRank.cc:767 #16 0x000055b73312bdcb in MDSRank::_dispatch (this=this@entry=0x55b734404000, m=m@entry=0x55b734e250c0, new_msg=new_msg@entry=true) at /work/ceph/src/mds/MDSRank.cc:604 #17 0x000055b73312c555 in MDSRankDispatcher::ms_dispatch (this=0x55b734404000, m=m@entry=0x55b734e250c0) at /work/ceph/src/mds/MDSRank.cc:583 #18 0x000055b7331173d3 in MDSDaemon::ms_dispatch (this=0x55b734310000, m=0x55b734e250c0) at /work/ceph/src/mds/MDSDaemon.cc:1154 #19 0x00007fa5b11ba51a in Messenger::ms_deliver_dispatch (m=0x55b734e250c0, this=0x55b7343e2000) at /work/ceph/src/msg/Messenger.h:663 #20 DispatchQueue::entry (this=0x55b7343e21a0) at /work/ceph/src/msg/DispatchQueue.cc:201 #21 0x00007fa5b125b4cd in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /work/ceph/src/msg/DispatchQueue.h:102 #22 0x00007fa5aee755aa in start_thread (arg=0x7fa5a8f14700) at pthread_create.c:463 #23 0x00007fa5ae060cbf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
config update via admin socket:
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fa5aee77cd5 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x55b734310040) at ../nptl/pthread_mutex_lock.c:78 #2 0x00007fa5b11128b9 in Mutex::Lock (this=this@entry=0x55b734310018, no_lockdep=no_lockdep@entry=false) at /work/ceph/src/common/Mutex.cc:107 #3 0x000055b73310e652 in MDSDaemon::handle_conf_change (this=0x55b734310000, conf=0x55b7342d0000, changed=std::set with 1 element = {...}) at /work/ceph/src/mds/MDSDaemon.cc:376 #4 0x00007fa5b141ecc3 in ceph::internal::md_config_impl<(ceph::internal::LockPolicy)1>::_apply_changes (this=<optimized out>, oss=<optimized out>) at /work/ceph/src/common/config.cc:818 #5 0x00007fa5b141ee70 in ceph::internal::md_config_impl<(ceph::internal::LockPolicy)1>::apply_changes (this=0x55b7342d0000, oss=oss@entry=0x7fa5ab7135a0) at /work/ceph/src/common/config.cc:777 #6 0x00007fa5b13393be in CephContext::do_command (this=0x55b734162900, command=..., cmdmap=..., format=..., out=0x7fa5ab714ac0) at /work/ceph/src/common/ceph_context.cc:478 #7 0x00007fa5b133cdb1 in CephContextHook::call (this=<optimized out>, command=..., cmdmap=..., format=..., out=...) at /work/ceph/src/common/ceph_context.cc:379 #8 0x00007fa5b111affa in AdminSocket::do_accept (this=this@entry=0x55b7341309a0) at /work/ceph/src/common/admin_socket.cc:396 #9 0x00007fa5b111c178 in AdminSocket::entry (this=0x55b7341309a0) at /work/ceph/src/common/admin_socket.cc:247 #10 0x00007fa5ae98b873 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #11 0x00007fa5aee755aa in start_thread (arg=0x7fa5ab719700) at pthread_create.c:463 #12 0x00007fa5ae060cbf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Updated by Venky Shankar almost 6 years ago
- Category set to Correctness/Safety
- Source set to Development
Updated by Venky Shankar almost 6 years ago
Patrick/Zheng,
I'm thinking if we could probably make use of Zheng's changes for tracker https://tracker.ceph.com/issues/24820 ("overhead of g_conf->get_val<type>("config name") is high"). Having cache variables for config options will avoid calling get_val() with mds_lock acquired -- config values will be updated under md_config_impl::lock -> mds_lock order (handle_conf_change()) and the IO path will fetch the updated config value just under mds_lock.
But this would mean enforcing the use of cache variable for certain code paths (which currently call get_val() under mds_lock) regardless of how hot the code path is.
Updated by Patrick Donnelly almost 6 years ago
Venky Shankar wrote:
Patrick/Zheng,
I'm thinking if we could probably make use of Zheng's changes for tracker https://tracker.ceph.com/issues/24820 ("overhead of g_conf->get_val<type>("config name") is high"). Having cache variables for config options will avoid calling get_val() with mds_lock acquired -- config values will be updated under md_config_impl::lock -> mds_lock order (handle_conf_change()) and the IO path will fetch the updated config value just under mds_lock.
But this would mean enforcing the use of cache variable for certain code paths (which currently call get_val() under mds_lock) regardless of how hot the code path is.
This is a tempting work-around but it does not resolve the real problem so future PRs which access the config variables through the normal means (g_conf->get_val) will cause this bug to resurface. We need an enduring fix.
Updated by Zheng Yan almost 6 years ago
how about making md_config_impl release its lock when changing handle_conf_change
Updated by Patrick Donnelly almost 6 years ago
Zheng Yan wrote:
how about making md_config_impl release its lock when changing handle_conf_change
+1
Updated by Venky Shankar almost 6 years ago
An alternate way would be to pass a reference to the lock to handle_conf_change() which could then acquire the locks in the correct order. I am testing this change as of now and will push a PR soon.
Updated by Venky Shankar over 5 years ago
- Status changed from New to Fix Under Review
Updated by Patrick Donnelly over 5 years ago
- Priority changed from Normal to Urgent
- Target version set to v14.0.0
- Backport set to mimic,luminous
- Severity changed from 3 - minor to 2 - major
Updated by Patrick Donnelly over 5 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #37761: mimic: mds: deadlock when setting config value via admin socket added
Updated by Nathan Cutler over 5 years ago
- Copied to Backport #37762: luminous: mds: deadlock when setting config value via admin socket added
Updated by Patrick Donnelly over 5 years ago
- Related to Bug #37751: handle_conf_change crash in osd added
Updated by Xiubo Li over 4 years ago
- Related to Bug #43251: mds: track client provided metric flags in session added
Updated by Xiubo Li over 4 years ago
- Related to Feature #43215: kclient: send usage metrics to MDS added
Updated by Xiubo Li over 4 years ago
- Related to deleted (Feature #43215: kclient: send usage metrics to MDS)
Updated by Xiubo Li over 4 years ago
- Related to deleted (Bug #43251: mds: track client provided metric flags in session)
Updated by Nathan Cutler almost 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Updated by Patrick Donnelly 8 months ago
- Related to Bug #62832: common: config_proxy deadlock during shutdown (and possibly other times) added