Project

General

Profile

Bug #5785

librados: deadlock during shutdown

Added by Sage Weil over 10 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
librados
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

the monc locking for librados shutdown looks broken. although the whole thing appears a bit wonky.

(gdb) thr app all bt

Thread 11 (Thread 0x7fb3356ef700 (LWP 4520)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fb3373e067b in ceph::log::Log::entry (this=0x12aa300) at log/Log.cc:323
#2  0x00007fb336fd9e9a in start_thread (arg=0x7fb3356ef700) at pthread_create.c:308
#3  0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 10 (Thread 0x7fb334eee700 (LWP 4521)):
#0  sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:102
#1  0x00007fb3374c2498 in CephContextServiceThread::entry (this=0x12ab1e0) at common/ceph_context.cc:57
#2  0x00007fb336fd9e9a in start_thread (arg=0x7fb334eee700) at pthread_create.c:308
#3  0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7fb3346ed700 (LWP 4522)):
#0  0x00007fb3367e5313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007fb337385dfb in AdminSocket::entry (this=0x12c3960) at common/admin_socket.cc:228
#2  0x00007fb336fd9e9a in start_thread (arg=0x7fb3346ed700) at pthread_create.c:308
#3  0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7fb333eec700 (LWP 4523)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
#1  0x00007fb336fdc065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007fb336fdbeba in __pthread_mutex_lock (mutex=0x12aeec0) at pthread_mutex_lock.c:61
#3  0x00007fb33737f043 in Mutex::Lock (this=0x12aeeb0, no_lockdep=<optimized out>) at common/Mutex.cc:89
#4  0x00007fb337311a2a in Locker (m=..., this=<synthetic pointer>) at ./common/Mutex.h:120
#5  librados::RadosClient::ms_handle_reset (this=0x12ae790, con=0x12b8700) at librados/RadosClient.cc:301
#6  0x00007fb3373ff3d5 in ms_deliver_handle_reset (con=<optimized out>, this=0x12c2440) at msg/Messenger.h:611
#7  DispatchQueue::entry (this=0x12c2528) at msg/DispatchQueue.cc:100
#8  0x00007fb33746ca7d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/DispatchQueue.h:104
#9  0x00007fb336fd9e9a in start_thread (arg=0x7fb333eec700) at pthread_create.c:308
#10 0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7fb3336eb700 (LWP 4524)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fb3374686c1 in Wait (mutex=..., this=0x12c2950) at ./common/Cond.h:55
#2  SimpleMessenger::reaper_entry (this=0x12c2440) at msg/SimpleMessenger.cc:206
#3  0x00007fb33746cf6d in SimpleMessenger::ReaperThread::entry (this=<optimized out>) at msg/SimpleMessenger.h:422
#4  0x00007fb336fd9e9a in start_thread (arg=0x7fb3336eb700) at pthread_create.c:308
#5  0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7fb332eea700 (LWP 4525)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
#1  0x00007fb3373945ab in WaitUntil (when=..., mutex=..., this=0x12aeb50) at common/Cond.h:71
#2  SafeTimer::timer_thread (this=0x12aeb40) at common/Timer.cc:114
#3  0x00007fb337394f5d in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38
#4  0x00007fb336fd9e9a in start_thread (arg=0x7fb332eea700) at pthread_create.c:308
#5  0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7fb3326e9700 (LWP 4526)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
#1  0x00007fb336fdc065 in _L_lock_858 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007fb336fdbeba in __pthread_mutex_lock (mutex=0x12aeec0) at pthread_mutex_lock.c:61
#3  0x00007fb33737f043 in Mutex::Lock (this=0x12aeeb0, no_lockdep=<optimized out>) at common/Mutex.cc:89
#4  0x00007fb337332078 in Locker (m=..., this=<synthetic pointer>) at ./common/Mutex.h:120
#5  Objecter::C_Linger_Map_Latest::finish (this=0x7fb314002970, r=<optimized out>) at osdc/Objecter.cc:717
#6  0x00007fb3373957b0 in Finisher::finisher_thread_entry (this=0x12aec08) at common/Finisher.cc:56
#7  0x00007fb336fd9e9a in start_thread (arg=0x7fb3326e9700) at pthread_create.c:308
#8  0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#9  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7fb331de7700 (LWP 4529)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
#1  0x00007fb336fe32b8 in _L_cond_lock_874 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007fb336fe3088 in __pthread_mutex_cond_lock (mutex=0x12aeec0) at ../nptl/pthread_mutex_lock.c:61
#3  0x00007fb336fde1b2 in pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:302
#4  0x00007fb3373945ab in WaitUntil (when=..., mutex=..., this=0x12aef58) at common/Cond.h:71
#5  SafeTimer::timer_thread (this=0x12aef48) at common/Timer.cc:114
#6  0x00007fb337394f5d in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38
#7  0x00007fb336fd9e9a in start_thread (arg=0x7fb331de7700) at pthread_create.c:308
#8  0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
---Type <return> to continue, or q <return> to quit---
#9  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7fb330ce4700 (LWP 13141)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007fb337457cdd in Wait (mutex=..., this=0x7fb320420db8) at ./common/Cond.h:55
#2  Pipe::writer (this=0x7fb320420bd0) at msg/Pipe.cc:1610
#3  0x00007fb337462e2d in Pipe::Writer::entry (this=<optimized out>) at msg/Pipe.h:59
#4  0x00007fb336fd9e9a in start_thread (arg=0x7fb330ce4700) at pthread_create.c:308
#5  0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7fb33817a700 (LWP 13142)):
#0  0x00007fb3367e5313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007fb33744b722 in Pipe::tcp_read_wait (this=<optimized out>) at msg/Pipe.cc:2144
#2  0x00007fb33744ba40 in Pipe::tcp_read (this=0x7fb320420bd0, buf=0x7fb338179dbf "\377/\032\206\066\263\177", len=1) at msg/Pipe.cc:2117
#3  0x00007fb33745f484 in Pipe::reader (this=0x7fb320420bd0) at msg/Pipe.cc:1353
#4  0x00007fb337462e4d in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47
#5  0x00007fb336fd9e9a in start_thread (arg=0x7fb33817a700) at pthread_create.c:308
#6  0x00007fb3367f0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7fb338182780 (LWP 4481)):
#0  0x00007fb336fdb148 in pthread_join (threadid=140407621981952, thread_return=0x0) at pthread_join.c:89
#1  0x00007fb3374c8b72 in Thread::join (this=0x12aed20, prval=<optimized out>) at common/Thread.cc:121
#2  0x00007fb3373e66ec in MonClient::shutdown (this=0x12ae958) at mon/MonClient.cc:317
#3  0x00007fb337311d0f in librados::RadosClient::shutdown (this=0x12ae790) at librados/RadosClient.cc:230
#4  0x00007fb337302798 in librados::Rados::shutdown (this=0x7fff13088800) at librados/librados.cc:1194
#5  0x0000000000419f76 in destroy_one_pool_pp (pool_name=..., cluster=...) at test/librados/test.cc:147
#6  0x0000000000414cdf in LibRadosWatchNotify_WatchNotifyTimeoutTestPP_Test::TestBody (this=<optimized out>) at test/librados/watch_notify.cc:90
#7  0x000000000042e67a in testing::Test::Run (this=0x12a9d60) at ./src/gtest.cc:2095
#8  0x000000000042e780 in testing::internal::TestInfoImpl::Run (this=0x12aa120) at ./src/gtest.cc:2314
#9  0x000000000042e84d in testing::TestCase::Run (this=0x12a9e40) at ./src/gtest.cc:2420
#10 0x000000000042eab7 in testing::internal::UnitTestImpl::RunAllTests (this=0x12a9ab0) at ./src/gtest.cc:4024
#11 0x0000000000414225 in main (argc=1, argv=0x7fff13088af8) at src/gtest_main.cc:38

job was

ubuntu@teuthology:/a/teuthology-2013-07-27_20:00:06-rados-cuttlefish-testing-basic-plana/86064$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 88b7f22bc0e44db48a24af23e4de3653bc44b2d2
machine_type: plana
nuke-on-error: true
os_type: ubuntu
overrides:
  admin_socket:
    branch: cuttlefish
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 1
        debug paxos: 20
        paxos min: 10
        paxos trim min: 10
    fs: xfs
    log-whitelist:
    - slow request
    sha1: b70a9abc5e3ae01204256f414bd7e69d083ed7c6
  ceph-deploy:
    branch:
      dev: cuttlefish
    conf:
      client:
        log file: /var/log/ceph/ceph-$name.$pid.log
      mon:
        debug mon: 1
        debug ms: 20
        debug paxos: 20
  install:
    ceph:
      sha1: b70a9abc5e3ae01204256f414bd7e69d083ed7c6
  s3tests:
    branch: cuttlefish
  workunit:
    sha1: b70a9abc5e3ae01204256f414bd7e69d083ed7c6
roles:
- - mon.a
  - mon.b
  - mon.c
  - mon.d
  - mon.e
  - osd.0
  - osd.1
  - osd.2
- - mon.f
  - mon.g
  - mon.h
  - mon.i
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph: null
- mon_thrash:
    revive_delay: 90
    thrash_delay: 1
- workunit:
    clients:
      client.0:
      - rados/test.sh
teuthology_branch: cuttlefish

History

#1 Updated by Sage Weil over 10 years ago

  • Status changed from New to Resolved

Also available in: Atom PDF