Project

General

Profile

Actions

Bug #61874

closed

mgr: DaemonServer::ms_handle_authentication acquires daemon locks

Added by Patrick Donnelly 10 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
High
Category:
ceph-mgr
Target version:
% Done:

100%

Source:
Development
Tags:
backport_processed
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://github.com/ceph/ceph/blob/7d93aa83bb962afda2781cc03076c7bf5eb7813f/src/mgr/DaemonServer.cc#L218-L224

This method can blocks with the entire EventCenter lock:

Thread 3 (Thread 0x7feeff18d700 (LWP 822150)):
#0  0x00007fef0339081d in __lll_lock_wait () from target:/lib64/libpthread.so.0
#1  0x00007fef03389ac9 in pthread_mutex_lock () from target:/lib64/libpthread.so.0
#2  0x00005610fa986d17 in std::mutex::lock() ()
#3  0x00005610fa9dee2c in DaemonServer::ms_handle_authentication(Connection*) ()
#4  0x00007fef04906e55 in MonClient::handle_auth_request(Connection*, AuthConnectionMeta*, bool, unsigned int, ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list*) () from target:/usr/lib64/ceph/libceph-common.so.2
#5  0x00007fef0489165f in ProtocolV2::_handle_auth_request(ceph::buffer::v15_2_0::list&, bool) () from target:/usr/lib64/ceph/libceph-common.so.2
#6  0x00007fef0489261e in ProtocolV2::handle_auth_request_more(ceph::buffer::v15_2_0::list&) () from target:/usr/lib64/ceph/libceph-common.so.2
#7  0x00007fef0489b0c3 in ProtocolV2::handle_frame_payload() () from target:/usr/lib64/ceph/libceph-common.so.2
#8  0x00007fef0489b380 in ProtocolV2::handle_read_frame_dispatch() () from target:/usr/lib64/ceph/libceph-common.so.2
#9  0x00007fef0489b575 in ProtocolV2::_handle_read_frame_epilogue_main() () from target:/usr/lib64/ceph/libceph-common.so.2
#10 0x00007fef0489b622 in ProtocolV2::_handle_read_frame_segment() () from target:/usr/lib64/ceph/libceph-common.so.2
#11 0x00007fef0489c781 in ProtocolV2::handle_read_frame_segment(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int) () from target:/usr/lib64/ceph/libceph-common.so.2
#12 0x00007fef04884eec in ProtocolV2::run_continuation(Ct<ProtocolV2>&) () from target:/usr/lib64/ceph/libceph-common.so.2
#13 0x00007fef0484d3f9 in AsyncConnection::process() () from target:/usr/lib64/ceph/libceph-common.so.2
#14 0x00007fef048a7507 in EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) () from target:/usr/lib64/ceph/libceph-common.so.2
#15 0x00007fef048ada1c in std::_Function_handler<void (), NetworkStack::add_thread(unsigned int)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from target:/usr/lib64/ceph/libceph-common.so.2
#16 0x00007fef027c2ba3 in execute_native_thread_routine () from target:/lib64/libstdc++.so.6
#17 0x00007fef033871cf in start_thread () from target:/lib64/libpthread.so.0
#18 0x00007fef01ddadd3 in clone () from target:/lib64/libc.so.6

If there is a weak deadlock on DaemonServer::lock, the entire messenger hangs. This can result in real deadlock like #61869.

In general, these fast messenger methods (like ::ms_fast_dispatch) must not acquire any locks.


Related issues 4 (0 open4 closed)

Related to CephFS - Bug #61869: pybind/cephfs: holds GIL during rmdirResolvedPatrick Donnelly

Actions
Copied to mgr - Backport #62607: quincy: mgr: DaemonServer::ms_handle_authentication acquires daemon locksResolvedPatrick DonnellyActions
Copied to mgr - Backport #62608: pacific: mgr: DaemonServer::ms_handle_authentication acquires daemon locksResolvedPatrick DonnellyActions
Copied to mgr - Backport #62609: reef: mgr: DaemonServer::ms_handle_authentication acquires daemon locksResolvedPatrick DonnellyActions
Actions #1

Updated by Patrick Donnelly 10 months ago

  • Category set to ceph-mgr
  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 52292
Actions #2

Updated by Patrick Donnelly 10 months ago

  • Related to Bug #61869: pybind/cephfs: holds GIL during rmdir added
Actions #3

Updated by Patrick Donnelly 9 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Backport Bot 9 months ago

  • Copied to Backport #62607: quincy: mgr: DaemonServer::ms_handle_authentication acquires daemon locks added
Actions #5

Updated by Backport Bot 9 months ago

  • Copied to Backport #62608: pacific: mgr: DaemonServer::ms_handle_authentication acquires daemon locks added
Actions #6

Updated by Backport Bot 9 months ago

  • Copied to Backport #62609: reef: mgr: DaemonServer::ms_handle_authentication acquires daemon locks added
Actions #7

Updated by Backport Bot 9 months ago

  • Tags set to backport_processed
Actions #8

Updated by Konstantin Shalygin 7 months ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF