Actions
Bug #9057
closedmark_down from fast dispatch can deadlock
Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x0000000000b40661 in Wait (mutex=..., this=0x2b648f38) at ./common/Cond.h:55 #2 Pipe::stop_and_wait (this=0x2b648d20) at msg/Pipe.cc:1412 #3 0x0000000000a5ecbd in SimpleMessenger::mark_down (this=0x7bc5b10, con=<optimized out>) at msg/SimpleMessenger.cc:673 #4 0x0000000000618674 in OSD::require_same_peer_instance (this=0xa3f2e90, op=..., map=...) at osd/OSD.cc:6710 #5 0x0000000000698995 in OSD::handle_replica_op<MOSDECSubOpWriteReply, 109> (this=0xa3f2e90, op=..., osdmap=...) at osd/OSD.cc:8073 #6 0x00000000006522be in OSD::dispatch_op_fast (this=0xa3f2e90, op=..., osdmap=...) at osd/OSD.cc:5692 #7 0x0000000000652384 in OSD::dispatch_session_waiting (this=0xa3f2e90, session=0x2e05ec30, osdmap=...) at osd/OSD.cc:5409 #8 0x000000000065431c in OSD::ms_fast_dispatch (this=0xa3f2e90, m=<optimized out>) at osd/OSD.cc:5432 #9 0x0000000000b2e029 in ms_fast_dispatch (m=0xb227ef0, this=0x7bc5b10) at msg/Messenger.h:607 #10 DispatchQueue::fast_dispatch (this=0x7bc5c18, m=0xb227ef0) at msg/DispatchQueue.cc:71 #11 0x0000000000b56c9b in Pipe::reader (this=0x2b648d20) at msg/Pipe.cc:1566 #12 0x0000000000b5893d in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:49 #13 0x0000000005288e9a in start_thread (arg=0x30719700) at pthread_create.c:308 #14 0x000000000690e3fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #15 0x0000000000000000 in ?? ()
that function does
void Pipe::stop_and_wait() { if (state != STATE_CLOSED) stop(); while (reader_running && reader_dispatching) cond.Wait(pipe_lock); }
Updated by Sage Weil almost 10 years ago
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-08-09_14:13:44-rados-next-testing-basic-multi/410713
3 (!) osds died with this deadlock.
(gdb) t 67 [Switching to thread 67 (Thread 0x7f0add460700 (LWP 17247))] #0 0x00007f0afadbb89c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) bt #0 0x00007f0afadbb89c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f0afadb7065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f0afadb6eba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #3 0x0000000000a3db83 in Mutex::Lock (this=0x1934bd0, no_lockdep=<optimized out>) at common/Mutex.cc:89 #4 0x0000000000a634f6 in Locker (m=..., this=<synthetic pointer>) at ./common/Mutex.h:120 #5 SimpleMessenger::get_connection (this=0x1934800, dest=...) at msg/SimpleMessenger.cc:382 #6 0x0000000000646842 in OSDService::send_message_osd_cluster (this=0x1db4da0, peer=1, m=0x2c1a600, from_epoch=104) at osd/OSD.cc:687 #7 0x000000000083a66b in ReplicatedPG::send_message (this=0x1fb5000, to_osd=1, m=0x2c1a600) at osd/ReplicatedPG.h:292 #8 0x000000000093d416 in ECBackend::dispatch_recovery_messages (this=0x1fe56c0, m=..., priority=127) at osd/ECBackend.cc:426 #9 0x000000000093e054 in ECBackend::handle_message (this=0x1fe56c0, _op=...) at osd/ECBackend.cc:681 #10 0x00000000007bdb1a in ReplicatedPG::do_request (this=0x1fb5000, op=..., handle=...) at osd/ReplicatedPG.cc:1109 #11 0x0000000000647182 in OSD::dequeue_op (this=0x1db3700, pg=..., op=..., handle=...) at osd/OSD.cc:8278 #12 0x0000000000647c51 in OSD::ShardedOpWQ::_process (this=0x1db46a8, thread_index=<optimized out>, hb=<optimized out>) at osd/OSD.cc:8180 #13 0x0000000000a7797c in ShardedThreadPool::shardedthreadpool_worker (this=0x1db3cf8, thread_index=8) at common/WorkQueue.cc:320 #14 0x0000000000a79260 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at common/WorkQueue.h:504 #15 0x00007f0afadb4e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #16 0x00007f0af97653fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #17 0x0000000000000000 in ?? () Thread 74 (Thread 0x7f0ad482f700 (LWP 18889)): #0 0x00007f0afadb8d84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x0000000000b411e1 in Wait (mutex=..., this=0x5de9fd8) at ./common/Cond.h:55 #2 Pipe::stop_and_wait (this=0x5de9dc0) at msg/Pipe.cc:1412 #3 0x0000000000a5f4fd in SimpleMessenger::mark_down (this=0x1934800, con=<optimized out>) at msg/SimpleMessenger.cc:673 #4 0x0000000000618824 in OSD::require_same_peer_instance (this=0x1db3700, op=..., map=...) at osd/OSD.cc:6710 #5 0x0000000000698b45 in OSD::handle_replica_op<MOSDECSubOpWriteReply, 109> (this=0x1db3700, op=..., osdmap=...) at osd/OSD.cc:8073 #6 0x000000000065246e in OSD::dispatch_op_fast (this=0x1db3700, op=..., osdmap=...) at osd/OSD.cc:5692 #7 0x0000000000652534 in OSD::dispatch_session_waiting (this=0x1db3700, session=0x25c9240, osdmap=...) at osd/OSD.cc:5409 #8 0x00000000006544cc in OSD::ms_fast_dispatch (this=0x1db3700, m=<optimized out>) at osd/OSD.cc:5432 #9 0x0000000000b2eba9 in ms_fast_dispatch (m=0x1be0400, this=0x1934800) at msg/Messenger.h:607 #10 DispatchQueue::fast_dispatch (this=0x1934908, m=0x1be0400) at msg/DispatchQueue.cc:71 #11 0x0000000000b5781b in Pipe::reader (this=0x5de9dc0) at msg/Pipe.cc:1566 #12 0x0000000000b594bd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:49 #13 0x00007f0afadb4e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #14 0x00007f0af97653fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #15 0x0000000000000000 in ?? ()
Updated by Sage Weil almost 10 years ago
- Status changed from New to Fix Under Review
Updated by Sage Weil over 9 years ago
- Status changed from Fix Under Review to Resolved
Actions