Project

General

Profile

Actions

Bug #15503

closed

msg/async: deadlock in rebind when enabling delay

Added by Sage Weil about 8 years ago. Updated about 5 years ago.

Status:
Rejected
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Thread 74 (Thread 0x7f2d894c9700 (LWP 755)):
#0  0x00007f2d964abf4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f2d964a7d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007f2d964a7c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f2d98731a88 in Mutex::Lock (this=this@entry=0x7f2da8b6dae0, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4  0x00007f2d9882e89f in stop (this=0x7f2da8b6d800) at msg/async/AsyncConnection.h:390
#5  AsyncMessenger::mark_down_all (this=0x7f2da37ee000) at msg/async/AsyncMessenger.cc:656
#6  0x00007f2d9882e517 in AsyncMessenger::rebind (this=0x7f2da37ee000, avoid_ports=std::set with 3 elements) at msg/async/AsyncMessenger.cc:456
#7  0x00007f2d98118cf7 in OSD::_committed_osd_maps (this=0x7f2da3992000, first=<optimized out>, last=352, m=0x7f2da4fb3680) at osd/OSD.cc:6928
#8  0x00007f2d98128399 in Context::complete (this=0x7f2da87d0a80, r=<optimized out>) at include/Context.h:64
#9  0x00007f2d986ae6e6 in Finisher::finisher_thread_entry (this=0x7f2da37d62c0) at common/Finisher.cc:68
#10 0x00007f2d964a5dc5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f2d94b3128d in clone () from /lib64/libc.so.6

Thread 84 (Thread 0x7f2d8e4d3700 (LWP 732)):
#0  0x00007f2d964abf4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f2d964a7d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007f2d964a7c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f2d98731a88 in Mutex::Lock (this=this@entry=0x7f2da78ec2e0, no_lockdep=no_lockdep@entry=false) at common/Mutex.cc:110
#4  0x00007f2d988a4e34 in Locker (m=..., this=<synthetic pointer>) at common/Mutex.h:115
#5  AsyncConnection::process (this=0x7f2da78ec000) at msg/async/AsyncConnection.cc:528
#6  0x00007f2d988488e5 in EventCenter::process_events (this=this@entry=0x7f2da37ea7c8, timeout_microseconds=timeout_microseconds@entry=30000000) at msg/async/Event.cc:399
#7  0x00007f2d98828fc0 in Worker::entry (this=0x7f2da37ea780) at msg/async/AsyncMessenger.cc:294
#8  0x00007f2d964a5dc5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f2d94b3128d in clone () from /lib64/libc.so.6

Thread 86 (Thread 0x7f2d8f4d5700 (LWP 730)):
#0  0x00007f2d964a96d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f2d98897587 in Wait (mutex=..., this=0x7f2da3da9398) at common/Cond.h:56
#2  wait_for_flush (this=0x7f2da3da92c0) at msg/async/AsyncConnection.h:177
#3  AsyncConnection::_stop (this=this@entry=0x7f2da8b6d800) at msg/async/AsyncConnection.cc:2272
#4  0x00007f2d9889e24b in AsyncConnection::handle_connect_msg (this=this@entry=0x7f2da8b6d800, connect=..., authorizer_bl=..., authorizer_reply=...) at msg/async/AsyncConnection.cc:1892
#5  0x00007f2d988a082c in AsyncConnection::_process_connection (this=this@entry=0x7f2da8b6d800) at msg/async/AsyncConnection.cc:1511
#6  0x00007f2d988a6810 in AsyncConnection::process (this=0x7f2da8b6d800) at msg/async/AsyncConnection.cc:993
#7  0x00007f2d988488e5 in EventCenter::process_events (this=this@entry=0x7f2da37ea2c8, timeout_microseconds=timeout_microseconds@entry=30000000) at msg/async/Event.cc:399
#8  0x00007f2d98828fc0 in Worker::entry (this=0x7f2da37ea280) at msg/async/AsyncMessenger.cc:294
#9  0x00007f2d964a5dc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f2d94b3128d in clone () from /lib64/libc.so.6

/a/sage-2016-04-14_11:23:09-rados-wip-sage-testing---basic-smithi/129588
Actions #1

Updated by Sage Weil about 8 years ago

/a/sage-2016-04-14_11:23:09-rados-wip-sage-testing---basic-smithi/129605

hit it too

Actions #2

Updated by Haomai Wang about 8 years ago

  • Subject changed from msg/async: deadlock in rebind to msg/async: deadlock in rebind when enabling delay
  • Category set to msgr
  • Status changed from New to In Progress
  • Assignee set to Haomai Wang

the root cause still is delay dispatch happen in another thread

Actions #3

Updated by Sage Weil almost 8 years ago

  • Status changed from In Progress to Rejected
Actions #4

Updated by Greg Farnum about 5 years ago

  • Project changed from Ceph to Messengers
  • Category deleted (msgr)
Actions

Also available in: Atom PDF