Project

General

Profile

Actions

Bug #17738

closed

Deadlock when shutdown() is called while still in init()

Added by John Spray over 7 years ago. Updated almost 7 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
ceph-mgr
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Thread 2 (Thread 0x7fd7ed1ed700 (LWP 5473)):
#0  0x00007fd7f7b246d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fd7fadb2f40 in C_SaferCond::wait() ()
#2  0x00007fd7fadc8cd5 in Mgr::init() ()
#3  0x00007fd7fadb2749 in Context::complete(int) ()
#4  0x00007fd7fae56c26 in Finisher::finisher_thread_entry() ()
#5  0x00007fd7f7b20dc5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007fd7f6c0bced in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7fd7f01f3700 (LWP 14694)):
#0  0x00007fd7f7b21ef7 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007fd7faf67db0 in Thread::join(void**) ()
#2  0x00007fd7fae561e0 in Finisher::stop() ()
#3  0x00007fd7fadc3e2f in Mgr::shutdown() ()
#4  0x00007fd7fadbd8e8 in MgrStandby::handle_mgr_map(MMgrMap*) ()
#5  0x00007fd7fadbe25d in MgrStandby::ms_dispatch(Message*) ()
#6  0x00007fd7fb03329a in DispatchQueue::entry() ()
#7  0x00007fd7faedffed in DispatchQueue::DispatchThread::entry() ()
#8  0x00007fd7f7b20dc5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007fd7f6c0bced in clone () from /lib64/libc.so.6

init drops the lock while it waits for maps. shutdown blocks on all finishers since this commit:

commit ce6b1909dd5579326dc331937c002078c3a1e55a
Author: xie xingguo <xie.xingguo@zte.com.cn>
Date:   Sun Oct 9 10:55:12 2016 +0800

    mgr: shutdown finisher in a graceful way

    Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

The shutdown function probably needs to either not wait for the finisher to drain, or needs to drop the Mgr::lock while it drains. Separately to that, init() probably needs to be refactored to avoid blocking, as if the mons are offline and we can't get maps, we should still be able to shutdown cleanly.


Related issues 1 (0 open1 closed)

Is duplicate of mgr - Bug #19743: mgr: segv in Mgr::shutdown() in PyThread_acquire_lock()ResolvedKefu Chai04/21/2017

Actions
Actions

Also available in: Atom PDF