Project

General

Profile

Actions

Bug #20776

closed

Possible deadlock during CephContextServiceThread shutdown

Added by Jason Dillaman almost 7 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
kraken,jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If the thread is requested to stop before it starts, it's possible that it will deadlock waiting on a conditional. The "while (1)" loop should be replaced with a "while (!_exit_thread)" and the lock should be moved outside the loop.

I'm not sure yet whether or not this is made worse by config, however - if I do something along the lines of:

seq 100 | xargs -P100 -n1 bash -c 'exec rbd.original showmapped'

I'll end up with at least one of the invocations deadlocked like below. Doing the same on our v10.2.7 clusters seems to work fine.

The stacktraces according to GDB looks something like this for all the ones I've looked at at least:
warning: the debug information found in "/usr/bin/rbd" does not match "/usr/bin/rbd.original" (CRC mismatch).
# Yes - we've diverted rbd to rbd.original with a shell-wrapper around it

[New LWP 285438]
[New LWP 285439]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fbbea58798d in pthread_join (threadid=140444952844032, thread_return=thread_return@entry=0x0) at pthread_join.c:90
90      pthread_join.c: No such file or directory.
Thread 3 (Thread 0x7fbbe3865700 (LWP 285439)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000055a852fcf896 in Cond::Wait (mutex=..., this=0x55a85cdeb258) at ./common/Cond.h:56
#2  CephContextServiceThread::entry (this=0x55a85cdeb1c0) at common/ceph_context.cc:101
#3  0x00007fbbea5866ba in start_thread (arg=0x7fbbe3865700) at pthread_create.c:333
#4  0x00007fbbe80743dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 2 (Thread 0x7fbbe4804700 (LWP 285438)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000055a852fb297b in ceph::log::Log::entry (this=0x55a85cd98830) at log/Log.cc:457
#2  0x00007fbbea5866ba in start_thread (arg=0x7fbbe4804700) at pthread_create.c:333
#3  0x00007fbbe80743dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 1 (Thread 0x7fbbfda1e100 (LWP 285436)):
#0  0x00007fbbea58798d in pthread_join (threadid=140444952844032, thread_return=thread_return@entry=0x0) at pthread_join.c:90
#1  0x000055a852fb6270 in Thread::join (this=this@entry=0x55a85cdeb1c0, prval=prval@entry=0x0) at common/Thread.cc:171
#2  0x000055a852fca060 in CephContext::join_service_thread (this=this@entry=0x55a85cd95780) at common/ceph_context.cc:637
#3  0x000055a852fcc2c7 in CephContext::~CephContext (this=0x55a85cd95780, __in_chrg=<optimized out>) at common/ceph_context.cc:507
#4  0x000055a852fcc9bc in CephContext::put (this=0x55a85cd95780) at common/ceph_context.cc:578
#5  0x000055a852eac2b1 in boost::intrusive_ptr<CephContext>::~intrusive_ptr (this=0x7ffef7ef5060, __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/intrusive_ptr.hpp:97
#6  main (argc=<optimized out>, argv=<optimized out>) at tools/rbd/rbd.cc:17 
Actions

Also available in: Atom PDF