Project

General

Profile

Actions

Bug #9057

closed

mark_down from fast dispatch can deadlock

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x0000000000b40661 in Wait (mutex=..., this=0x2b648f38) at ./common/Cond.h:55
#2  Pipe::stop_and_wait (this=0x2b648d20) at msg/Pipe.cc:1412
#3  0x0000000000a5ecbd in SimpleMessenger::mark_down (this=0x7bc5b10, con=<optimized out>) at msg/SimpleMessenger.cc:673
#4  0x0000000000618674 in OSD::require_same_peer_instance (this=0xa3f2e90, op=..., map=...) at osd/OSD.cc:6710
#5  0x0000000000698995 in OSD::handle_replica_op<MOSDECSubOpWriteReply, 109> (this=0xa3f2e90, op=..., osdmap=...) at osd/OSD.cc:8073
#6  0x00000000006522be in OSD::dispatch_op_fast (this=0xa3f2e90, op=..., osdmap=...) at osd/OSD.cc:5692
#7  0x0000000000652384 in OSD::dispatch_session_waiting (this=0xa3f2e90, session=0x2e05ec30, osdmap=...) at osd/OSD.cc:5409
#8  0x000000000065431c in OSD::ms_fast_dispatch (this=0xa3f2e90, m=<optimized out>) at osd/OSD.cc:5432
#9  0x0000000000b2e029 in ms_fast_dispatch (m=0xb227ef0, this=0x7bc5b10) at msg/Messenger.h:607
#10 DispatchQueue::fast_dispatch (this=0x7bc5c18, m=0xb227ef0) at msg/DispatchQueue.cc:71
#11 0x0000000000b56c9b in Pipe::reader (this=0x2b648d20) at msg/Pipe.cc:1566
#12 0x0000000000b5893d in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:49
#13 0x0000000005288e9a in start_thread (arg=0x30719700) at pthread_create.c:308
#14 0x000000000690e3fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#15 0x0000000000000000 in ?? ()

that function does

void Pipe::stop_and_wait()
{
  if (state != STATE_CLOSED)
    stop();

  while (reader_running &&
     reader_dispatching)
    cond.Wait(pipe_lock);
}
Actions #1

Updated by Sage Weil over 9 years ago

ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-08-09_14:13:44-rados-next-testing-basic-multi/410713

3 (!) osds died with this deadlock.

(gdb) t 67
[Switching to thread 67 (Thread 0x7f0add460700 (LWP 17247))]
#0  0x00007f0afadbb89c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007f0afadbb89c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f0afadb7065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f0afadb6eba in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3  0x0000000000a3db83 in Mutex::Lock (this=0x1934bd0, no_lockdep=<optimized out>) at common/Mutex.cc:89
#4  0x0000000000a634f6 in Locker (m=..., this=<synthetic pointer>) at ./common/Mutex.h:120
#5  SimpleMessenger::get_connection (this=0x1934800, dest=...) at msg/SimpleMessenger.cc:382
#6  0x0000000000646842 in OSDService::send_message_osd_cluster (this=0x1db4da0, peer=1, m=0x2c1a600,
    from_epoch=104) at osd/OSD.cc:687
#7  0x000000000083a66b in ReplicatedPG::send_message (this=0x1fb5000, to_osd=1, m=0x2c1a600)
    at osd/ReplicatedPG.h:292
#8  0x000000000093d416 in ECBackend::dispatch_recovery_messages (this=0x1fe56c0, m=..., priority=127)
    at osd/ECBackend.cc:426
#9  0x000000000093e054 in ECBackend::handle_message (this=0x1fe56c0, _op=...) at osd/ECBackend.cc:681
#10 0x00000000007bdb1a in ReplicatedPG::do_request (this=0x1fb5000, op=..., handle=...)
    at osd/ReplicatedPG.cc:1109
#11 0x0000000000647182 in OSD::dequeue_op (this=0x1db3700, pg=..., op=..., handle=...) at osd/OSD.cc:8278
#12 0x0000000000647c51 in OSD::ShardedOpWQ::_process (this=0x1db46a8, thread_index=<optimized out>,
    hb=<optimized out>) at osd/OSD.cc:8180
#13 0x0000000000a7797c in ShardedThreadPool::shardedthreadpool_worker (this=0x1db3cf8, thread_index=8)
    at common/WorkQueue.cc:320
#14 0x0000000000a79260 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>)
    at common/WorkQueue.h:504
#15 0x00007f0afadb4e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#16 0x00007f0af97653fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x0000000000000000 in ?? ()

Thread 74 (Thread 0x7f0ad482f700 (LWP 18889)):
#0  0x00007f0afadb8d84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x0000000000b411e1 in Wait (mutex=..., this=0x5de9fd8) at ./common/Cond.h:55
#2  Pipe::stop_and_wait (this=0x5de9dc0) at msg/Pipe.cc:1412
#3  0x0000000000a5f4fd in SimpleMessenger::mark_down (this=0x1934800, con=<optimized out>)
    at msg/SimpleMessenger.cc:673
#4  0x0000000000618824 in OSD::require_same_peer_instance (this=0x1db3700, op=..., map=...) at osd/OSD.cc:6710
#5  0x0000000000698b45 in OSD::handle_replica_op<MOSDECSubOpWriteReply, 109> (this=0x1db3700, op=..., osdmap=...)
    at osd/OSD.cc:8073
#6  0x000000000065246e in OSD::dispatch_op_fast (this=0x1db3700, op=..., osdmap=...) at osd/OSD.cc:5692
#7  0x0000000000652534 in OSD::dispatch_session_waiting (this=0x1db3700, session=0x25c9240, osdmap=...)
    at osd/OSD.cc:5409
#8  0x00000000006544cc in OSD::ms_fast_dispatch (this=0x1db3700, m=<optimized out>) at osd/OSD.cc:5432
#9  0x0000000000b2eba9 in ms_fast_dispatch (m=0x1be0400, this=0x1934800) at msg/Messenger.h:607
#10 DispatchQueue::fast_dispatch (this=0x1934908, m=0x1be0400) at msg/DispatchQueue.cc:71
#11 0x0000000000b5781b in Pipe::reader (this=0x5de9dc0) at msg/Pipe.cc:1566
#12 0x0000000000b594bd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:49
#13 0x00007f0afadb4e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#14 0x00007f0af97653fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#15 0x0000000000000000 in ?? ()
Actions #2

Updated by Sage Weil over 9 years ago

  • Status changed from New to Fix Under Review
Actions #3

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF