Actions
Bug #9213
closedwip-objecter: RWTimer shutdown is deadlock-prone
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
commit 085db68d6b2aaaf874188a90cde6f4b5d52b0dde Author: Yehuda Sadeh <yehuda@inktank.com> Date: Tue May 27 17:09:50 2014 -0700 timer: fix RWTimer shutdown Signed-off-by: Yehuda Sadeh <yehuda@inktank.com> commit c73b1b2bb698ebf5b925c5d4c7c37a1c25edd0a2 Author: Yehuda Sadeh <yehuda@inktank.com> Date: Mon Mar 17 12:58:02 2014 -0700 time: create RWTimer a timer implementation that uses RWLock Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
http://pulpito.front.sepia.ceph.com/john-2014-08-24_01:43:51-rados-wip-objecter-testing-basic-multi/445802/
(manifests as a dead job, got the backtrace by sshing to the running nodes)
#0 pthread_rwlock_wrlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:83 #1 0x00007f295e9be53a in get_write (lockdep=true, this=0x224ffc0) at common/RWLock.h:88 #2 RWTimer::timer_thread (this=0x2250020) at common/Timer.cc:293 #3 0x00007f295e9c210d in RWTimerThread::entry (this=<optimized out>) at common/Timer.cc:200 #4 0x00007f295e565e9a in start_thread (arg=0x7f295a471700) at pthread_create.c:308 #5 0x00007f295dd7c3fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 1 (Thread 0x7f295f8be780 (LWP 11130)): #0 0x00007f295e567148 in pthread_join (threadid=139815584995072, thread_return=0x0) at pthread_join.c:89 #1 0x00007f295e9e9c22 in Thread::join (this=0x2252bf0, prval=<optimized out>) at common/Thread.cc:139 #2 0x00007f295e9c06be in RWTimer::shutdown (this=0x2250020) at common/Timer.cc:234 #3 0x00007f295e941ee4 in Objecter::shutdown (this=0x224ff00) at osdc/Objecter.cc:389 #4 0x00007f295e90c0da in librados::RadosClient::shutdown (this=0x22342f0) at librados/RadosClient.cc:302 #5 0x00007f295e8f5a29 in rados_shutdown (cluster=0x22342f0) at librados/librados.cc:1907 #6 0x000000000040473a in StRadosCreatePool::run (this=0x7fffd5ab3190) at test/system/st_rados_create_pool.cc:108 #7 0x000000000040538a in systest_runnable_pthread_helper (arg=<optimized out>) at test/system/systest_runnable.cc:203 #8 0x000000000040544b in SysTestRunnable::start (this=0x7fffd5ab3190) at test/system/systest_runnable.cc:102 #9 0x000000000040708b in SysTestRunnable::run_until_finished (runnables=...) at test/system/systest_runnable.cc:174 #10 0x0000000000403957 in main (argc=1, argv=0x7fffd5ab3548) at test/system/rados_delete_pools_parallel.cc:101
While holding rwlock, RWTimer::shutdown sets the 'stopping' flag, and then tries to join the thread. This deadlocks in the case that the thread was waiting for rwlock. Manifests as a failure of OSD or client to shut down.
Updated by John Spray over 9 years ago
- Status changed from New to 7
In for tonight's run
commit ef442928649ce8a9af7629a57bbfec0258bf9049 Author: John Spray <john.spray@redhat.com> Date: Mon Aug 25 01:29:02 2014 +0100 common/Timer: fix deadlock in RWTimer::shutdown Fixes: #9213 Signed-off-by: John Spray <john.spray@redhat.com>
Actions