Bug #5785
librados: deadlock during shutdown
Status:
Resolved
Priority:
High
Assignee:
-
Category:
librados
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
the monc locking for librados shutdown looks broken. although the whole thing appears a bit wonky.
(gdb) thr app all bt Thread 11 (Thread 0x7fb3356ef700 (LWP 4520)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007fb3373e067b in ceph::log::Log::entry (this=0x12aa300) at log/Log.cc:323 #2 0x00007fb336fd9e9a in start_thread (arg=0x7fb3356ef700) at pthread_create.c:308 #3 0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 10 (Thread 0x7fb334eee700 (LWP 4521)): #0 sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:102 #1 0x00007fb3374c2498 in CephContextServiceThread::entry (this=0x12ab1e0) at common/ceph_context.cc:57 #2 0x00007fb336fd9e9a in start_thread (arg=0x7fb334eee700) at pthread_create.c:308 #3 0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 9 (Thread 0x7fb3346ed700 (LWP 4522)): #0 0x00007fb3367e5313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007fb337385dfb in AdminSocket::entry (this=0x12c3960) at common/admin_socket.cc:228 #2 0x00007fb336fd9e9a in start_thread (arg=0x7fb3346ed700) at pthread_create.c:308 #3 0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 8 (Thread 0x7fb333eec700 (LWP 4523)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 #1 0x00007fb336fdc065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007fb336fdbeba in __pthread_mutex_lock (mutex=0x12aeec0) at pthread_mutex_lock.c:61 #3 0x00007fb33737f043 in Mutex::Lock (this=0x12aeeb0, no_lockdep=<optimized out>) at common/Mutex.cc:89 #4 0x00007fb337311a2a in Locker (m=..., this=<synthetic pointer>) at ./common/Mutex.h:120 #5 librados::RadosClient::ms_handle_reset (this=0x12ae790, con=0x12b8700) at librados/RadosClient.cc:301 #6 0x00007fb3373ff3d5 in ms_deliver_handle_reset (con=<optimized out>, this=0x12c2440) at msg/Messenger.h:611 #7 DispatchQueue::entry (this=0x12c2528) at msg/DispatchQueue.cc:100 #8 0x00007fb33746ca7d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/DispatchQueue.h:104 #9 0x00007fb336fd9e9a in start_thread (arg=0x7fb333eec700) at pthread_create.c:308 #10 0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #11 0x0000000000000000 in ?? () Thread 7 (Thread 0x7fb3336eb700 (LWP 4524)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007fb3374686c1 in Wait (mutex=..., this=0x12c2950) at ./common/Cond.h:55 #2 SimpleMessenger::reaper_entry (this=0x12c2440) at msg/SimpleMessenger.cc:206 #3 0x00007fb33746cf6d in SimpleMessenger::ReaperThread::entry (this=<optimized out>) at msg/SimpleMessenger.h:422 #4 0x00007fb336fd9e9a in start_thread (arg=0x7fb3336eb700) at pthread_create.c:308 #5 0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 6 (Thread 0x7fb332eea700 (LWP 4525)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x00007fb3373945ab in WaitUntil (when=..., mutex=..., this=0x12aeb50) at common/Cond.h:71 #2 SafeTimer::timer_thread (this=0x12aeb40) at common/Timer.cc:114 #3 0x00007fb337394f5d in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38 #4 0x00007fb336fd9e9a in start_thread (arg=0x7fb332eea700) at pthread_create.c:308 #5 0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 5 (Thread 0x7fb3326e9700 (LWP 4526)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 #1 0x00007fb336fdc065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007fb336fdbeba in __pthread_mutex_lock (mutex=0x12aeec0) at pthread_mutex_lock.c:61 #3 0x00007fb33737f043 in Mutex::Lock (this=0x12aeeb0, no_lockdep=<optimized out>) at common/Mutex.cc:89 #4 0x00007fb337332078 in Locker (m=..., this=<synthetic pointer>) at ./common/Mutex.h:120 #5 Objecter::C_Linger_Map_Latest::finish (this=0x7fb314002970, r=<optimized out>) at osdc/Objecter.cc:717 #6 0x00007fb3373957b0 in Finisher::finisher_thread_entry (this=0x12aec08) at common/Finisher.cc:56 #7 0x00007fb336fd9e9a in start_thread (arg=0x7fb3326e9700) at pthread_create.c:308 #8 0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #9 0x0000000000000000 in ?? () Thread 4 (Thread 0x7fb331de7700 (LWP 4529)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 #1 0x00007fb336fe32b8 in _L_cond_lock_874 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007fb336fe3088 in __pthread_mutex_cond_lock (mutex=0x12aeec0) at ../nptl/pthread_mutex_lock.c:61 #3 0x00007fb336fde1b2 in pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:302 #4 0x00007fb3373945ab in WaitUntil (when=..., mutex=..., this=0x12aef58) at common/Cond.h:71 #5 SafeTimer::timer_thread (this=0x12aef48) at common/Timer.cc:114 #6 0x00007fb337394f5d in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38 #7 0x00007fb336fd9e9a in start_thread (arg=0x7fb331de7700) at pthread_create.c:308 #8 0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 ---Type <return> to continue, or q <return> to quit--- #9 0x0000000000000000 in ?? () Thread 3 (Thread 0x7fb330ce4700 (LWP 13141)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00007fb337457cdd in Wait (mutex=..., this=0x7fb320420db8) at ./common/Cond.h:55 #2 Pipe::writer (this=0x7fb320420bd0) at msg/Pipe.cc:1610 #3 0x00007fb337462e2d in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59 #4 0x00007fb336fd9e9a in start_thread (arg=0x7fb330ce4700) at pthread_create.c:308 #5 0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 2 (Thread 0x7fb33817a700 (LWP 13142)): #0 0x00007fb3367e5313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00007fb33744b722 in Pipe::tcp_read_wait (this=<optimized out>) at msg/Pipe.cc:2144 #2 0x00007fb33744ba40 in Pipe::tcp_read (this=0x7fb320420bd0, buf=0x7fb338179dbf "\377/\032\206\066\263\177", len=1) at msg/Pipe.cc:2117 #3 0x00007fb33745f484 in Pipe::reader (this=0x7fb320420bd0) at msg/Pipe.cc:1353 #4 0x00007fb337462e4d in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47 #5 0x00007fb336fd9e9a in start_thread (arg=0x7fb33817a700) at pthread_create.c:308 #6 0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Thread 1 (Thread 0x7fb338182780 (LWP 4481)): #0 0x00007fb336fdb148 in pthread_join (threadid=140407621981952, thread_return=0x0) at pthread_join.c:89 #1 0x00007fb3374c8b72 in Thread::join (this=0x12aed20, prval=<optimized out>) at common/Thread.cc:121 #2 0x00007fb3373e66ec in MonClient::shutdown (this=0x12ae958) at mon/MonClient.cc:317 #3 0x00007fb337311d0f in librados::RadosClient::shutdown (this=0x12ae790) at librados/RadosClient.cc:230 #4 0x00007fb337302798 in librados::Rados::shutdown (this=0x7fff13088800) at librados/librados.cc:1194 #5 0x0000000000419f76 in destroy_one_pool_pp (pool_name=..., cluster=...) at test/librados/test.cc:147 #6 0x0000000000414cdf in LibRadosWatchNotify_WatchNotifyTimeoutTestPP_Test::TestBody (this=<optimized out>) at test/librados/watch_notify.cc:90 #7 0x000000000042e67a in testing::Test::Run (this=0x12a9d60) at ./src/gtest.cc:2095 #8 0x000000000042e780 in testing::internal::TestInfoImpl::Run (this=0x12aa120) at ./src/gtest.cc:2314 #9 0x000000000042e84d in testing::TestCase::Run (this=0x12a9e40) at ./src/gtest.cc:2420 #10 0x000000000042eab7 in testing::internal::UnitTestImpl::RunAllTests (this=0x12a9ab0) at ./src/gtest.cc:4024 #11 0x0000000000414225 in main (argc=1, argv=0x7fff13088af8) at src/gtest_main.cc:38
job was
ubuntu@teuthology:/a/teuthology-2013-07-27_20:00:06-rados-cuttlefish-testing-basic-plana/86064$ cat orig.config.yaml kernel: kdb: true sha1: 88b7f22bc0e44db48a24af23e4de3653bc44b2d2 machine_type: plana nuke-on-error: true os_type: ubuntu overrides: admin_socket: branch: cuttlefish ceph: conf: global: ms inject socket failures: 5000 mon: debug mon: 20 debug ms: 1 debug paxos: 20 paxos min: 10 paxos trim min: 10 fs: xfs log-whitelist: - slow request sha1: b70a9abc5e3ae01204256f414bd7e69d083ed7c6 ceph-deploy: branch: dev: cuttlefish conf: client: log file: /var/log/ceph/ceph-$name.$pid.log mon: debug mon: 1 debug ms: 20 debug paxos: 20 install: ceph: sha1: b70a9abc5e3ae01204256f414bd7e69d083ed7c6 s3tests: branch: cuttlefish workunit: sha1: b70a9abc5e3ae01204256f414bd7e69d083ed7c6 roles: - - mon.a - mon.b - mon.c - mon.d - mon.e - osd.0 - osd.1 - osd.2 - - mon.f - mon.g - mon.h - mon.i - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: null - mon_thrash: revive_delay: 90 thrash_delay: 1 - workunit: clients: client.0: - rados/test.sh teuthology_branch: cuttlefish
History
#1 Updated by Sage Weil over 10 years ago
- Status changed from New to Resolved