Project

General

Profile

Bug #5818

leveldb 1.12: hang on shutdown (mon)

Added by Samuel Just over 10 years ago. Updated almost 9 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ubuntu@teuthology:/a/teuthology-2013-07-31_01:00:23-rados-next-testing-basic-plana/91208

The processes are still running.

History

#1 Updated by Ian Colle over 10 years ago

  • Assignee set to Joao Eduardo Luis

#2 Updated by Sage Weil over 10 years ago

(&*#@ leveldb:


Thread 6 (Thread 27885):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00000000008161c3 in leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper(void*) ()
#2  0x0000000004e39e9a in start_thread (arg=0x10951700) at pthread_create.c:308
#3  0x000000000686bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 5 (Thread 27884):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x0000000000632def in Wait (mutex=..., this=0x6c32648) at ./common/Cond.h:55
#2  LevelDBStore::compact_thread_entry (this=0x6c325c0) at os/LevelDBStore.cc:221
#3  0x0000000000538e8d in LevelDBStore::CompactThread::entry (this=<optimized out>) at ./os/LevelDBStore.h:63
#4  0x0000000004e39e9a in start_thread (arg=0x10150700) at pthread_create.c:308
#5  0x000000000686bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 4 (Thread 26810):
#0  0x0000000006860313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00000000007174eb in AdminSocket::entry (this=0x6b50030) at common/admin_socket.cc:230
#2  0x0000000004e39e9a in start_thread (arg=0x8739700) at pthread_create.c:308
#3  0x000000000686bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 3 (Thread 26808):
#0  sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:102
#1  0x0000000000727f78 in CephContextServiceThread::entry (this=0x6c2d020) at common/ceph_context.cc:57
#2  0x0000000004e39e9a in start_thread (arg=0x7f38700) at pthread_create.c:308
#3  0x000000000686bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 2 (Thread 26804):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x000000000067ec1b in ceph::log::Log::entry (this=0x6b4f610) at log/Log.cc:323
#2  0x0000000004e39e9a in start_thread (arg=0x7737700) at pthread_create.c:308
#3  0x000000000686bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 26760):
#0  0x0000000004e3b148 in pthread_join (threadid=269813504, thread_return=0x0) at pthread_join.c:89
#1  0x0000000000709cc2 in Thread::join (this=0x6c326a0, prval=<optimized out>) at common/Thread.cc:121
#2  0x000000000063245f in LevelDBStore::close (this=0x6c325c0) at os/LevelDBStore.cc:88
#3  0x000000000063257b in LevelDBStore::~LevelDBStore (this=0x6c325c0, __in_chrg=<optimized out>) at os/LevelDBStore.cc:76
#4  0x00000000006327f9 in LevelDBStore::~LevelDBStore (this=0x6c325c0, __in_chrg=<optimized out>) at os/LevelDBStore.cc:78
#5  0x0000000000534e46 in main (argc=<optimized out>, argv=0x6c70f10) at ceph_mon.cc:566

#3 Updated by Sage Weil over 10 years ago

  • Status changed from New to 12
  • Source changed from other to Q/A

#4 Updated by Sage Weil over 10 years ago

  • Assignee deleted (Joao Eduardo Luis)

#5 Updated by Sage Weil over 10 years ago

  • Status changed from 12 to 15

#6 Updated by Sage Weil over 10 years ago

  • Priority changed from Urgent to High

#7 Updated by Sage Weil over 10 years ago

  • Status changed from 15 to Resolved

#8 Updated by Sage Weil over 10 years ago

  • Status changed from Resolved to 12

well, observed this hang with 1.12.

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-10-31_23:00:13-rados-next-testing-basic-plana/78222

(gdb) thr app all bt

Thread 6 (Thread 21247):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00000000008463d3 in leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper(void*) ()
#2  0x0000000004e39e9a in start_thread (arg=0x10749700) at pthread_create.c:308
#3  0x0000000006663ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 5 (Thread 21244):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00000000007f5d47 in Wait (mutex=..., this=0x6a38818) at ./common/Cond.h:55
#2  LevelDBStore::compact_thread_entry (this=0x6a38790) at os/LevelDBStore.cc:221
#3  0x00000000005414ad in LevelDBStore::CompactThread::entry (this=<optimized out>) at ./os/LevelDBStore.h:69
#4  0x0000000004e39e9a in start_thread (arg=0xff48700) at pthread_create.c:308
#5  0x0000000006663ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 4 (Thread 17517):
#0  0x0000000006658313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x000000000073cdcb in AdminSocket::entry (this=0x6949840) at common/admin_socket.cc:231
#2  0x0000000004e39e9a in start_thread (arg=0x8531700) at pthread_create.c:308
#3  0x0000000006663ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 3 (Thread 17512):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
#1  0x000000000074ecf6 in WaitUntil (when=..., mutex=..., this=0x6a32f58) at ./common/Cond.h:71
#2  WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x6a32f58) at ./common/Cond.h:79
#3  CephContextServiceThread::entry (this=0x6a32ef0) at common/ceph_context.cc:56
#4  0x0000000004e39e9a in start_thread (arg=0x7d30700) at pthread_create.c:308
#5  0x0000000006663ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 2 (Thread 17508):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00000000006a2cab in ceph::log::Log::entry (this=0x6948e20) at log/Log.cc:319
#2  0x0000000004e39e9a in start_thread (arg=0x752f700) at pthread_create.c:308
#3  0x0000000006663ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 17470):
#0  0x0000000004e3b148 in pthread_join (threadid=267683584, thread_return=0x0) at pthread_join.c:89
#1  0x000000000072f5a2 in Thread::join (this=0x6a38870, prval=<optimized out>) at common/Thread.cc:121
#2  0x00000000007f5342 in LevelDBStore::close (this=0x6a38790) at os/LevelDBStore.cc:88
#3  0x00000000007f5472 in LevelDBStore::~LevelDBStore (this=0x6a38790, __in_chrg=<optimized out>) at os/LevelDBStore.cc:76
#4  0x00000000007f5719 in LevelDBStore::~LevelDBStore (this=0x6a38790, __in_chrg=<optimized out>) at os/LevelDBStore.cc:78
#5  0x000000000053d2d0 in main (argc=<optimized out>, argv=0x6a770c0) at ceph_mon.cc:568
(gdb) q

#9 Updated by Sage Weil over 10 years ago

ubuntu@plana83:~$ dpkg -l | grep leveldb
rc libleveldb1 1.12.0-1precise.ceph fast key-value storage library

#10 Updated by Sage Weil about 10 years ago

  • Subject changed from mon stuck in shutdown to leveldb: hang on shutdown (mon)

#11 Updated by Sage Weil almost 10 years ago

observed this again on leveldb 1.12:


ubuntu@plana87:~$ dpkg -l|grep leveldb
ii  libleveldb1                                         1.12.0-1precise.ceph                        fast key-value storage library
ubuntu@plana87:~$ fg
sudo gdb ceph-mon

Thread 6 (Thread 26150):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x0000000006001f33 in ?? () from /usr/lib/x86_64-linux-gnu/libleveldb.so.1
#2  0x00000000050a6e9a in start_thread (arg=0xe41e700) at pthread_create.c:308
#3  0x0000000006b1bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 5 (Thread 26148):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x0000000000840500 in Wait (mutex=..., this=0x42d3450) at ./common/Cond.h:55
#2  LevelDBStore::compact_thread_entry (this=0x42d33c0) at os/LevelDBStore.cc:245
#3  0x0000000000547bfd in LevelDBStore::CompactThread::entry (this=<optimized out>) at ./os/LevelDBStore.h:69
#4  0x00000000050a6e9a in start_thread (arg=0xdc1d700) at pthread_create.c:308
#5  0x0000000006b1bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 4 (Thread 24607):
#0  0x0000000006b10313 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
#1  0x0000000000776dfb in AdminSocket::entry (this=0x42d3140) at common/admin_socket.cc:239
#2  0x00000000050a6e9a in start_thread (arg=0x8a08700) at pthread_create.c:308
#3  0x0000000006b1bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 3 (Thread 24603):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
#1  0x0000000000790296 in WaitUntil (when=..., mutex=..., this=0x42b7728) at ./common/Cond.h:71
#2  WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x42b7728) at ./common/Cond.h:79
#3  CephContextServiceThread::entry (this=0x42b76c0) at common/ceph_context.cc:58
#4  0x00000000050a6e9a in start_thread (arg=0x8207700) at pthread_create.c:308
#5  0x0000000006b1bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6  0x0000000000000000 in ?? ()

Thread 2 (Thread 24591):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00000000006ce09b in ceph::log::Log::entry (this=0x42d3000) at log/Log.cc:323
#2  0x00000000050a6e9a in start_thread (arg=0x7a06700) at pthread_create.c:308
#3  0x0000000006b1bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 24544):
#0  0x00000000050a8148 in pthread_join (threadid=230807296, thread_return=0x0) at pthread_join.c:89
#1  0x0000000000769362 in Thread::join (this=0x42d34a8, prval=<optimized out>) at common/Thread.cc:121
#2  0x000000000083f741 in LevelDBStore::close (this=0x42d33c0) at os/LevelDBStore.cc:112
#3  0x000000000083f7be in LevelDBStore::~LevelDBStore (this=0x42d33c0, __in_chrg=<optimized out>) at os/LevelDBStore.cc:97
#4  0x000000000083f989 in LevelDBStore::~LevelDBStore (this=0x42d33c0, __in_chrg=<optimized out>) at os/LevelDBStore.cc:102
#5  0x0000000000543549 in main (argc=<optimized out>, argv=0x42b5e60) at ceph_mon.cc:691
(gdb) q

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-04-07_23:00:18-rgw-master-testing-basic-plana/177905

#12 Updated by Sage Weil almost 10 years ago

  • Subject changed from leveldb: hang on shutdown (mon) to leveldb 1.12: hang on shutdown (mon)

#13 Updated by Ian Colle almost 10 years ago

  • Status changed from 12 to Duplicate

This is related to #5847, but I can't link them due to limitations in RedMine

#14 Updated by Ian Colle almost 10 years ago

  • Status changed from Duplicate to 12

#15 Updated by Loïc Dachary about 9 years ago

http://pulpito.ceph.com/loic-2015-01-30_18:24:12-rados-v0.67.11---basic-multi/731949/

2015-01-30T16:36:40.068 INFO:tasks.ceph.mds.a:Stopped
2015-01-30T16:36:40.068 INFO:teuthology.misc:Shutting down osd daemons...
2015-01-30T16:36:46.069 INFO:tasks.ceph.osd.1:Stopped
2015-01-30T16:36:52.070 INFO:tasks.ceph.osd.0:Stopped
2015-01-30T16:36:58.071 INFO:tasks.ceph.osd.3:Stopped
2015-01-30T16:37:04.072 INFO:tasks.ceph.osd.2:Stopped
2015-01-30T16:37:10.073 INFO:tasks.ceph.osd.5:Stopped
2015-01-30T16:37:16.073 INFO:tasks.ceph.osd.4:Stopped
2015-01-30T16:37:16.073 INFO:teuthology.misc:Shutting down mon daemons...
2015-01-30T16:37:16.261 INFO:tasks.ceph.mon.a.plana20.stderr:2015-01-30 16:37:16.181412 c4f0700 -1 mon.a@0(leader) e1 *** Got Signal Terminated ***
2015-01-30T16:42:10.113 INFO:tasks.ceph:Checking cluster log for badness...
... a few lines later ..
  File "/home/teuthworker/src/teuthology_master/teuthology/contextutil.py", line 134, in __call__
    raise MaxWhileTries(error_msg)
MaxWhileTries: reached maximum tries (50) after waiting for 300 seconds

#16 Updated by Sage Weil almost 9 years ago

  • Status changed from 12 to Won't Fix

Also available in: Atom PDF