Bug #5818
leveldb 1.12: hang on shutdown (mon)
Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
Monitor
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
ubuntu@teuthology:/a/teuthology-2013-07-31_01:00:23-rados-next-testing-basic-plana/91208
The processes are still running.
History
#1 Updated by Ian Colle over 10 years ago
- Assignee set to Joao Eduardo Luis
#2 Updated by Sage Weil over 10 years ago
(&*#@ leveldb:
Thread 6 (Thread 27885): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00000000008161c3 in leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper(void*) () #2 0x0000000004e39e9a in start_thread (arg=0x10951700) at pthread_create.c:308 #3 0x000000000686bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 5 (Thread 27884): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x0000000000632def in Wait (mutex=..., this=0x6c32648) at ./common/Cond.h:55 #2 LevelDBStore::compact_thread_entry (this=0x6c325c0) at os/LevelDBStore.cc:221 #3 0x0000000000538e8d in LevelDBStore::CompactThread::entry (this=<optimized out>) at ./os/LevelDBStore.h:63 #4 0x0000000004e39e9a in start_thread (arg=0x10150700) at pthread_create.c:308 #5 0x000000000686bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 4 (Thread 26810): #0 0x0000000006860313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00000000007174eb in AdminSocket::entry (this=0x6b50030) at common/admin_socket.cc:230 #2 0x0000000004e39e9a in start_thread (arg=0x8739700) at pthread_create.c:308 #3 0x000000000686bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 3 (Thread 26808): #0 sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:102 #1 0x0000000000727f78 in CephContextServiceThread::entry (this=0x6c2d020) at common/ceph_context.cc:57 #2 0x0000000004e39e9a in start_thread (arg=0x7f38700) at pthread_create.c:308 #3 0x000000000686bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 2 (Thread 26804): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x000000000067ec1b in ceph::log::Log::entry (this=0x6b4f610) at log/Log.cc:323 #2 0x0000000004e39e9a in start_thread (arg=0x7737700) at pthread_create.c:308 #3 0x000000000686bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 1 (Thread 26760): #0 0x0000000004e3b148 in pthread_join (threadid=269813504, thread_return=0x0) at pthread_join.c:89 #1 0x0000000000709cc2 in Thread::join (this=0x6c326a0, prval=<optimized out>) at common/Thread.cc:121 #2 0x000000000063245f in LevelDBStore::close (this=0x6c325c0) at os/LevelDBStore.cc:88 #3 0x000000000063257b in LevelDBStore::~LevelDBStore (this=0x6c325c0, __in_chrg=<optimized out>) at os/LevelDBStore.cc:76 #4 0x00000000006327f9 in LevelDBStore::~LevelDBStore (this=0x6c325c0, __in_chrg=<optimized out>) at os/LevelDBStore.cc:78 #5 0x0000000000534e46 in main (argc=<optimized out>, argv=0x6c70f10) at ceph_mon.cc:566
#3 Updated by Sage Weil over 10 years ago
- Status changed from New to 12
- Source changed from other to Q/A
looks like this: https://code.google.com/p/leveldb/issues/detail?id=125
#4 Updated by Sage Weil over 10 years ago
- Assignee deleted (
Joao Eduardo Luis)
#5 Updated by Sage Weil over 10 years ago
- Status changed from 12 to 15
#6 Updated by Sage Weil over 10 years ago
- Priority changed from Urgent to High
#7 Updated by Sage Weil over 10 years ago
- Status changed from 15 to Resolved
#8 Updated by Sage Weil over 10 years ago
- Status changed from Resolved to 12
well, observed this hang with 1.12.
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-10-31_23:00:13-rados-next-testing-basic-plana/78222
(gdb) thr app all bt Thread 6 (Thread 21247): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00000000008463d3 in leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper(void*) () #2 0x0000000004e39e9a in start_thread (arg=0x10749700) at pthread_create.c:308 #3 0x0000000006663ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 5 (Thread 21244): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00000000007f5d47 in Wait (mutex=..., this=0x6a38818) at ./common/Cond.h:55 #2 LevelDBStore::compact_thread_entry (this=0x6a38790) at os/LevelDBStore.cc:221 #3 0x00000000005414ad in LevelDBStore::CompactThread::entry (this=<optimized out>) at ./os/LevelDBStore.h:69 #4 0x0000000004e39e9a in start_thread (arg=0xff48700) at pthread_create.c:308 #5 0x0000000006663ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 4 (Thread 17517): #0 0x0000000006658313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x000000000073cdcb in AdminSocket::entry (this=0x6949840) at common/admin_socket.cc:231 #2 0x0000000004e39e9a in start_thread (arg=0x8531700) at pthread_create.c:308 #3 0x0000000006663ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 3 (Thread 17512): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x000000000074ecf6 in WaitUntil (when=..., mutex=..., this=0x6a32f58) at ./common/Cond.h:71 #2 WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x6a32f58) at ./common/Cond.h:79 #3 CephContextServiceThread::entry (this=0x6a32ef0) at common/ceph_context.cc:56 #4 0x0000000004e39e9a in start_thread (arg=0x7d30700) at pthread_create.c:308 #5 0x0000000006663ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 2 (Thread 17508): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00000000006a2cab in ceph::log::Log::entry (this=0x6948e20) at log/Log.cc:319 #2 0x0000000004e39e9a in start_thread (arg=0x752f700) at pthread_create.c:308 #3 0x0000000006663ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 1 (Thread 17470): #0 0x0000000004e3b148 in pthread_join (threadid=267683584, thread_return=0x0) at pthread_join.c:89 #1 0x000000000072f5a2 in Thread::join (this=0x6a38870, prval=<optimized out>) at common/Thread.cc:121 #2 0x00000000007f5342 in LevelDBStore::close (this=0x6a38790) at os/LevelDBStore.cc:88 #3 0x00000000007f5472 in LevelDBStore::~LevelDBStore (this=0x6a38790, __in_chrg=<optimized out>) at os/LevelDBStore.cc:76 #4 0x00000000007f5719 in LevelDBStore::~LevelDBStore (this=0x6a38790, __in_chrg=<optimized out>) at os/LevelDBStore.cc:78 #5 0x000000000053d2d0 in main (argc=<optimized out>, argv=0x6a770c0) at ceph_mon.cc:568 (gdb) q
#9 Updated by Sage Weil over 10 years ago
ubuntu@plana83:~$ dpkg -l | grep leveldb
rc libleveldb1 1.12.0-1precise.ceph fast key-value storage library
#10 Updated by Sage Weil about 10 years ago
- Subject changed from mon stuck in shutdown to leveldb: hang on shutdown (mon)
#11 Updated by Sage Weil almost 10 years ago
observed this again on leveldb 1.12:
ubuntu@plana87:~$ dpkg -l|grep leveldb ii libleveldb1 1.12.0-1precise.ceph fast key-value storage library ubuntu@plana87:~$ fg sudo gdb ceph-mon Thread 6 (Thread 26150): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x0000000006001f33 in ?? () from /usr/lib/x86_64-linux-gnu/libleveldb.so.1 #2 0x00000000050a6e9a in start_thread (arg=0xe41e700) at pthread_create.c:308 #3 0x0000000006b1bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 5 (Thread 26148): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x0000000000840500 in Wait (mutex=..., this=0x42d3450) at ./common/Cond.h:55 #2 LevelDBStore::compact_thread_entry (this=0x42d33c0) at os/LevelDBStore.cc:245 #3 0x0000000000547bfd in LevelDBStore::CompactThread::entry (this=<optimized out>) at ./os/LevelDBStore.h:69 #4 0x00000000050a6e9a in start_thread (arg=0xdc1d700) at pthread_create.c:308 #5 0x0000000006b1bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 4 (Thread 24607): #0 0x0000000006b10313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x0000000000776dfb in AdminSocket::entry (this=0x42d3140) at common/admin_socket.cc:239 #2 0x00000000050a6e9a in start_thread (arg=0x8a08700) at pthread_create.c:308 #3 0x0000000006b1bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 3 (Thread 24603): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 #1 0x0000000000790296 in WaitUntil (when=..., mutex=..., this=0x42b7728) at ./common/Cond.h:71 #2 WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x42b7728) at ./common/Cond.h:79 #3 CephContextServiceThread::entry (this=0x42b76c0) at common/ceph_context.cc:58 #4 0x00000000050a6e9a in start_thread (arg=0x8207700) at pthread_create.c:308 #5 0x0000000006b1bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #6 0x0000000000000000 in ?? () Thread 2 (Thread 24591): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x00000000006ce09b in ceph::log::Log::entry (this=0x42d3000) at log/Log.cc:323 #2 0x00000000050a6e9a in start_thread (arg=0x7a06700) at pthread_create.c:308 #3 0x0000000006b1bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 1 (Thread 24544): #0 0x00000000050a8148 in pthread_join (threadid=230807296, thread_return=0x0) at pthread_join.c:89 #1 0x0000000000769362 in Thread::join (this=0x42d34a8, prval=<optimized out>) at common/Thread.cc:121 #2 0x000000000083f741 in LevelDBStore::close (this=0x42d33c0) at os/LevelDBStore.cc:112 #3 0x000000000083f7be in LevelDBStore::~LevelDBStore (this=0x42d33c0, __in_chrg=<optimized out>) at os/LevelDBStore.cc:97 #4 0x000000000083f989 in LevelDBStore::~LevelDBStore (this=0x42d33c0, __in_chrg=<optimized out>) at os/LevelDBStore.cc:102 #5 0x0000000000543549 in main (argc=<optimized out>, argv=0x42b5e60) at ceph_mon.cc:691 (gdb) qubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-04-07_23:00:18-rgw-master-testing-basic-plana/177905
#12 Updated by Sage Weil almost 10 years ago
- Subject changed from leveldb: hang on shutdown (mon) to leveldb 1.12: hang on shutdown (mon)
#13 Updated by Ian Colle almost 10 years ago
- Status changed from 12 to Duplicate
This is related to #5847, but I can't link them due to limitations in RedMine
#14 Updated by Ian Colle almost 10 years ago
- Status changed from Duplicate to 12
#15 Updated by Loïc Dachary about 9 years ago
http://pulpito.ceph.com/loic-2015-01-30_18:24:12-rados-v0.67.11---basic-multi/731949/
2015-01-30T16:36:40.068 INFO:tasks.ceph.mds.a:Stopped 2015-01-30T16:36:40.068 INFO:teuthology.misc:Shutting down osd daemons... 2015-01-30T16:36:46.069 INFO:tasks.ceph.osd.1:Stopped 2015-01-30T16:36:52.070 INFO:tasks.ceph.osd.0:Stopped 2015-01-30T16:36:58.071 INFO:tasks.ceph.osd.3:Stopped 2015-01-30T16:37:04.072 INFO:tasks.ceph.osd.2:Stopped 2015-01-30T16:37:10.073 INFO:tasks.ceph.osd.5:Stopped 2015-01-30T16:37:16.073 INFO:tasks.ceph.osd.4:Stopped 2015-01-30T16:37:16.073 INFO:teuthology.misc:Shutting down mon daemons... 2015-01-30T16:37:16.261 INFO:tasks.ceph.mon.a.plana20.stderr:2015-01-30 16:37:16.181412 c4f0700 -1 mon.a@0(leader) e1 *** Got Signal Terminated *** 2015-01-30T16:42:10.113 INFO:tasks.ceph:Checking cluster log for badness... ... a few lines later .. File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 134, in __call__ raise MaxWhileTries(error_msg) MaxWhileTries: reached maximum tries (50) after waiting for 300 seconds
#16 Updated by Sage Weil almost 9 years ago
- Status changed from 12 to Won't Fix