Project

General

Profile

Bug #24823

mds: deadlock when setting config value via admin socket

Added by Venky Shankar 5 months ago. Updated 19 days ago.

Status:
Need Review
Priority:
Urgent
Assignee:
Category:
Correctness/Safety
Target version:
Start date:
07/09/2018
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

I hit the below deadlock while working on a PR which introduced a configuration option for MDS. The problem is with how MDS (MDSDaemon) handles configuration changes. With ongoing IO, its possible that MDS is trying to fetch a configured value (via get_val<>) under mds_lock (lock order: mds_lock -> md_config_impl::lock) concurrently with a "config set" (via admin socket) trying to update a config value (lock order: md_config_impl::lock -> mds_lock, note that MDSDaemon::handle_conf_change() tries to acquire mds_lock).

Ongoing IO:

#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fa5aee77d46 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x55b7342d3590) at ../nptl/pthread_mutex_lock.c:113
#2  0x00007fa5b11128b9 in Mutex::Lock (this=0x55b7342d3568, no_lockdep=<optimized out>) at /work/ceph/src/common/Mutex.cc:107
#3  0x00007fa5b13c3399 in Mutex::Locker::Locker (m=..., this=<optimized out>) at /work/ceph/src/common/Mutex.h:116
#4  ceph::internal::LockMutex<(ceph::internal::LockPolicy)1>::operator() (this=<optimized out>) at /work/ceph/src/common/lock_mutex.h:37
#5  ceph::internal::md_config_impl<(ceph::internal::LockPolicy)1>::get_val_generic (this=this@entry=0x55b7342d0000, key="mds_cache_size") at /work/ceph/src/common/config.cc:1049
#6  0x000055b733153d8b in ceph::internal::md_config_impl<(ceph::internal::LockPolicy)1>::get_val<long> (this=this@entry=0x55b7342d0000, key="mds_cache_size") at /work/ceph/src/common/config.h:412
#7  0x000055b7332687af in MDCache::cache_limit_inodes () at /work/ceph/src/mds/MDCache.h:155
#8  MDCache::cache_toofull_ratio (this=<optimized out>) at /work/ceph/src/mds/MDCache.h:170
#9  0x000055b733221ae8 in MDCache::cache_toofull (this=0x55b734405000) at /work/ceph/src/mds/MDCache.h:176
#10 MDCache::add_inode (this=0x55b734405000, in=<optimized out>, in@entry=0x55b735718300) at /work/ceph/src/mds/MDCache.cc:287
#11 0x000055b73319cf65 in Server::prepare_new_inode (this=this@entry=0x55b73413ee60, mdr=..., dir=<optimized out>, useino=..., mode=mode@entry=33188, layout=layout@entry=0x7fa5a8f0f8a0)
    at /work/ceph/src/mds/Server.cc:2730
#12 0x000055b73319f38d in Server::handle_client_openc (this=0x55b73413ee60, mdr=...) at /work/ceph/src/mds/Server.cc:3684
#13 0x000055b7331a240d in Server::handle_client_request (this=this@entry=0x55b73413ee60, req=req@entry=0x55b734e250c0) at /work/ceph/src/mds/Server.cc:1773
#14 0x000055b7331a618b in Server::dispatch (this=0x55b73413ee60, m=m@entry=0x55b734e250c0) at /work/ceph/src/mds/Server.cc:270
#15 0x000055b73311eeb4 in MDSRank::handle_deferrable_message (this=this@entry=0x55b734404000, m=m@entry=0x55b734e250c0) at /work/ceph/src/mds/MDSRank.cc:767
#16 0x000055b73312bdcb in MDSRank::_dispatch (this=this@entry=0x55b734404000, m=m@entry=0x55b734e250c0, new_msg=new_msg@entry=true) at /work/ceph/src/mds/MDSRank.cc:604
#17 0x000055b73312c555 in MDSRankDispatcher::ms_dispatch (this=0x55b734404000, m=m@entry=0x55b734e250c0) at /work/ceph/src/mds/MDSRank.cc:583
#18 0x000055b7331173d3 in MDSDaemon::ms_dispatch (this=0x55b734310000, m=0x55b734e250c0) at /work/ceph/src/mds/MDSDaemon.cc:1154
#19 0x00007fa5b11ba51a in Messenger::ms_deliver_dispatch (m=0x55b734e250c0, this=0x55b7343e2000) at /work/ceph/src/msg/Messenger.h:663
#20 DispatchQueue::entry (this=0x55b7343e21a0) at /work/ceph/src/msg/DispatchQueue.cc:201
#21 0x00007fa5b125b4cd in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /work/ceph/src/msg/DispatchQueue.h:102
#22 0x00007fa5aee755aa in start_thread (arg=0x7fa5a8f14700) at pthread_create.c:463
#23 0x00007fa5ae060cbf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

config update via admin socket:

#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fa5aee77cd5 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x55b734310040) at ../nptl/pthread_mutex_lock.c:78
#2  0x00007fa5b11128b9 in Mutex::Lock (this=this@entry=0x55b734310018, no_lockdep=no_lockdep@entry=false) at /work/ceph/src/common/Mutex.cc:107
#3  0x000055b73310e652 in MDSDaemon::handle_conf_change (this=0x55b734310000, conf=0x55b7342d0000, changed=std::set with 1 element = {...}) at /work/ceph/src/mds/MDSDaemon.cc:376
#4  0x00007fa5b141ecc3 in ceph::internal::md_config_impl<(ceph::internal::LockPolicy)1>::_apply_changes (this=<optimized out>, oss=<optimized out>) at /work/ceph/src/common/config.cc:818
#5  0x00007fa5b141ee70 in ceph::internal::md_config_impl<(ceph::internal::LockPolicy)1>::apply_changes (this=0x55b7342d0000, oss=oss@entry=0x7fa5ab7135a0) at /work/ceph/src/common/config.cc:777
#6  0x00007fa5b13393be in CephContext::do_command (this=0x55b734162900, command=..., cmdmap=..., format=..., out=0x7fa5ab714ac0) at /work/ceph/src/common/ceph_context.cc:478
#7  0x00007fa5b133cdb1 in CephContextHook::call (this=<optimized out>, command=..., cmdmap=..., format=..., out=...) at /work/ceph/src/common/ceph_context.cc:379
#8  0x00007fa5b111affa in AdminSocket::do_accept (this=this@entry=0x55b7341309a0) at /work/ceph/src/common/admin_socket.cc:396
#9  0x00007fa5b111c178 in AdminSocket::entry (this=0x55b7341309a0) at /work/ceph/src/common/admin_socket.cc:247
#10 0x00007fa5ae98b873 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007fa5aee755aa in start_thread (arg=0x7fa5ab719700) at pthread_create.c:463
#12 0x00007fa5ae060cbf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

History

#1 Updated by Venky Shankar 5 months ago

  • Category set to Correctness/Safety
  • Source set to Development

#2 Updated by Patrick Donnelly 5 months ago

  • Assignee set to Venky Shankar

#3 Updated by Venky Shankar 5 months ago

Patrick/Zheng,

I'm thinking if we could probably make use of Zheng's changes for tracker https://tracker.ceph.com/issues/24820 ("overhead of g_conf->get_val<type>("config name") is high"). Having cache variables for config options will avoid calling get_val() with mds_lock acquired -- config values will be updated under md_config_impl::lock -> mds_lock order (handle_conf_change()) and the IO path will fetch the updated config value just under mds_lock.

But this would mean enforcing the use of cache variable for certain code paths (which currently call get_val() under mds_lock) regardless of how hot the code path is.

#4 Updated by Patrick Donnelly 5 months ago

Venky Shankar wrote:

Patrick/Zheng,

I'm thinking if we could probably make use of Zheng's changes for tracker https://tracker.ceph.com/issues/24820 ("overhead of g_conf->get_val<type>("config name") is high"). Having cache variables for config options will avoid calling get_val() with mds_lock acquired -- config values will be updated under md_config_impl::lock -> mds_lock order (handle_conf_change()) and the IO path will fetch the updated config value just under mds_lock.

But this would mean enforcing the use of cache variable for certain code paths (which currently call get_val() under mds_lock) regardless of how hot the code path is.

This is a tempting work-around but it does not resolve the real problem so future PRs which access the config variables through the normal means (g_conf->get_val) will cause this bug to resurface. We need an enduring fix.

#5 Updated by Zheng Yan 5 months ago

how about making md_config_impl release its lock when changing handle_conf_change

#6 Updated by Patrick Donnelly 5 months ago

Zheng Yan wrote:

how about making md_config_impl release its lock when changing handle_conf_change

+1

#7 Updated by Venky Shankar 5 months ago

An alternate way would be to pass a reference to the lock to handle_conf_change() which could then acquire the locks in the correct order. I am testing this change as of now and will push a PR soon.

#8 Updated by Venky Shankar 5 months ago

  • Status changed from New to Need Review

#9 Updated by Patrick Donnelly 19 days ago

  • Priority changed from Normal to Urgent
  • Target version set to v14.0.0
  • Backport set to mimic,luminous
  • Severity changed from 3 - minor to 2 - major

Also available in: Atom PDF