Project

General

Profile

Actions

Bug #9213

closed

wip-objecter: RWTimer shutdown is deadlock-prone

Added by John Spray over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

commit 085db68d6b2aaaf874188a90cde6f4b5d52b0dde
Author: Yehuda Sadeh <yehuda@inktank.com>
Date:   Tue May 27 17:09:50 2014 -0700

    timer: fix RWTimer shutdown

    Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>

commit c73b1b2bb698ebf5b925c5d4c7c37a1c25edd0a2
Author: Yehuda Sadeh <yehuda@inktank.com>
Date:   Mon Mar 17 12:58:02 2014 -0700

    time: create RWTimer

    a timer implementation that uses RWLock

    Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>

http://pulpito.front.sepia.ceph.com/john-2014-08-24_01:43:51-rados-wip-objecter-testing-basic-multi/445802/
(manifests as a dead job, got the backtrace by sshing to the running nodes)

#0  pthread_rwlock_wrlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:83
#1  0x00007f295e9be53a in get_write (lockdep=true, this=0x224ffc0) at common/RWLock.h:88
#2  RWTimer::timer_thread (this=0x2250020) at common/Timer.cc:293
#3  0x00007f295e9c210d in RWTimerThread::entry (this=<optimized out>) at common/Timer.cc:200
#4  0x00007f295e565e9a in start_thread (arg=0x7f295a471700) at pthread_create.c:308
#5  0x00007f295dd7c3fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7f295f8be780 (LWP 11130)):
#0  0x00007f295e567148 in pthread_join (threadid=139815584995072, thread_return=0x0) at pthread_join.c:89
#1  0x00007f295e9e9c22 in Thread::join (this=0x2252bf0, prval=<optimized out>) at common/Thread.cc:139
#2  0x00007f295e9c06be in RWTimer::shutdown (this=0x2250020) at common/Timer.cc:234
#3  0x00007f295e941ee4 in Objecter::shutdown (this=0x224ff00) at osdc/Objecter.cc:389
#4  0x00007f295e90c0da in librados::RadosClient::shutdown (this=0x22342f0) at librados/RadosClient.cc:302
#5  0x00007f295e8f5a29 in rados_shutdown (cluster=0x22342f0) at librados/librados.cc:1907
#6  0x000000000040473a in StRadosCreatePool::run (this=0x7fffd5ab3190) at test/system/st_rados_create_pool.cc:108
#7  0x000000000040538a in systest_runnable_pthread_helper (arg=<optimized out>) at test/system/systest_runnable.cc:203
#8  0x000000000040544b in SysTestRunnable::start (this=0x7fffd5ab3190) at test/system/systest_runnable.cc:102
#9  0x000000000040708b in SysTestRunnable::run_until_finished (runnables=...) at test/system/systest_runnable.cc:174
#10 0x0000000000403957 in main (argc=1, argv=0x7fffd5ab3548) at test/system/rados_delete_pools_parallel.cc:101

While holding rwlock, RWTimer::shutdown sets the 'stopping' flag, and then tries to join the thread. This deadlocks in the case that the thread was waiting for rwlock. Manifests as a failure of OSD or client to shut down.

Actions #1

Updated by John Spray over 9 years ago

  • Status changed from New to 7

In for tonight's run

commit ef442928649ce8a9af7629a57bbfec0258bf9049
Author: John Spray <john.spray@redhat.com>
Date:   Mon Aug 25 01:29:02 2014 +0100

    common/Timer: fix deadlock in RWTimer::shutdown

    Fixes: #9213

    Signed-off-by: John Spray <john.spray@redhat.com>

Actions #2

Updated by John Spray over 9 years ago

  • Status changed from 7 to Resolved

Didn't recur.

Actions

Also available in: Atom PDF