Bug #2090
closedmon: assertion failed on shutdown
0%
Description
I was running repeated cycles of the kernel_untar_build.sh workunit
to try to reproduce a problem in the client and hit this problem
instead.
./common/Mutex.h: 89: FAILED assert(nlock 0)
Looks like this is the version:
ceph version 0.42-106-g761ecc6 (761ecc69c24856b15531c92b69b1c73c5cc81bfc)
Full info below (sorry if it's excessive, I'm not sure what's needed).
Here is the YAML file that was driving it:
roles:
- [mon.a, mon.c, osd.0]
- [mon.b, mds.a, osd.1]
- [client.0]
kernel:
mon:
branch: master
client:
branch: testing
overrides:
ceph:
conf:
osd:
osd op complaint time: 120
coverage: true
fs: btrfs
log-whitelist:
- clocks not synchronized
- old request
tasks:
- ceph:
- rbd:
all:
- workunit:
all:
- kernel_untar_build.sh
And here is the information I got at the end of the run:
. . .
INFO:teuthology.task.workunit.client.0.out:removed `linux-2.6.33/usr/gen_init_cpio.c'
INFO:teuthology.task.workunit.client.0.out:removed directory: `linux-2.6.33/usr'
INFO:teuthology.task.workunit.client.0.out:removed `linux-2.6.33/README'
INFO:teuthology.task.workunit.client.0.out:removed `linux-2.6.33/COPYING'
INFO:teuthology.task.workunit.client.0.out:removed directory: `linux-2.6.33'
INFO:teuthology.task.workunit.client.0.out:removed directory: `t'
INFO:teuthology.task.workunit.client.0.out:removed `linux-2.6.33.tar.bz2'
INFO:teuthology.task.rbd:Unmounting rbd images...
INFO:teuthology.task.rbd:Unmapping rbd devices...
INFO:teuthology.task.rbd:Unloading rbd kernel module...
INFO:teuthology.task.rbd:Deleting rbd images...
Removing image: 100% complete...done.
INFO:teuthology.task.ceph:Shutting down mds daemons...
INFO:teuthology.task.ceph.mds.a.err:2012-02-21 22:14:40.784639 7f5f8deb8700 mds.0.1 * got signal Terminated
INFO:teuthology.task.ceph.mds.a:Stopped
INFO:teuthology.task.ceph:Shutting down osd daemons...
INFO:teuthology.task.ceph.osd.1:Stopped
INFO:teuthology.task.ceph.osd.0:Stopped
INFO:teuthology.task.ceph:Shutting down mon daemons...
INFO:teuthology.task.ceph.mon.a.err:2012-02-21 22:14:42.088201 7f51e2e79700 mon.a@1(peon) e1 Got Signal Terminated
INFO:teuthology.task.ceph.mon.a.err:./common/Mutex.h: In function 'Mutex::~Mutex()' thread 7f51e6d16780 time 2012-02-21 22:14:42.091122
INFO:teuthology.task.ceph.mon.a.err:./common/Mutex.h: 89: FAILED assert(nlock 0)
INFO:teuthology.task.ceph.mon.a.err: ceph version 0.42-106-g761ecc6 (761ecc69c24856b15531c92b69b1c73c5cc81bfc)
INFO:teuthology.task.ceph.mon.a.err: 1: (Monitor::~Monitor()+0x95c) [0x476d2c]
INFO:teuthology.task.ceph.mon.a.err: 2: (main()+0x301e) [0x461a1e]
INFO:teuthology.task.ceph.mon.a.err: 3: (libc_start_main()+0xfe) [0x7f51e50b5d8e]
INFO:teuthology.task.ceph.mon.a.err: 4: /tmp/cephtest/binary/usr/local/bin/ceph-mon() [0x45e799]
INFO:teuthology.task.ceph.mon.a.err: ceph version 0.42-106-g761ecc6 (761ecc69c24856b15531c92b69b1c73c5cc81bfc)
INFO:teuthology.task.ceph.mon.a.err: 1: (Monitor::~Monitor()+0x95c) [0x476d2c]
INFO:teuthology.task.ceph.mon.a.err: 2: (main()+0x301e) [0x461a1e]
INFO:teuthology.task.ceph.mon.a.err: 3: (_libc_start_main()+0xfe) [0x7f51e50b5d8e]
INFO:teuthology.task.ceph.mon.a.err: 4: /tmp/cephtest/binary/usr/local/bin/ceph-mon() [0x45e799]
INFO:teuthology.task.ceph.mon.a.err:terminate called after throwing an instance of 'ceph::FailedAssertion'
INFO:teuthology.task.ceph.mon.a.err:** Caught signal (Aborted)
INFO:teuthology.task.ceph.mon.a.err: in thread 7f51e6d16780
INFO:teuthology.task.ceph.mon.a.err: ceph version 0.42-106-g761ecc6 (761ecc69c24856b15531c92b69b1c73c5cc81bfc)
INFO:teuthology.task.ceph.mon.a.err: 1: /tmp/cephtest/binary/usr/local/bin/ceph-mon() [0x5b2931]
INFO:teuthology.task.ceph.mon.a.err: 2: (()+0xfb40) [0x7f51e68f6b40]
INFO:teuthology.task.ceph.mon.a.err: 3: (gsignal()+0x35) [0x7f51e50caba5]
INFO:teuthology.task.ceph.mon.a.err: 4: (abort()+0x180) [0x7f51e50ce6b0]
INFO:teuthology.task.ceph.mon.a.err: 5: (_gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f51e596e6bd]
INFO:teuthology.task.ceph.mon.a.err: 6: (()+0xb9906) [0x7f51e596c906]
INFO:teuthology.task.ceph.mon.a.err: 7: (()+0xb9933) [0x7f51e596c933]
INFO:teuthology.task.ceph.mon.a.err: 8: (()+0xb9a3e) [0x7f51e596ca3e]
INFO:teuthology.task.ceph.mon.a.err: 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x841) [0x534871]
INFO:teuthology.task.ceph.mon.a.err: 10: (Monitor::~Monitor()+0x95c) [0x476d2c]
INFO:teuthology.task.ceph.mon.a.err: 11: (main()+0x301e) [0x461a1e]
INFO:teuthology.task.ceph.mon.a.err: 12: (_libc_start_main()+0xfe) [0x7f51e50b5d8e]
INFO:teuthology.task.ceph.mon.a.err: 13: /tmp/cephtest/binary/usr/local/bin/ceph-mon() [0x45e799]
INFO:teuthology.task.ceph.mon.a.err:daemon-helper: command crashed with signal 6
ERROR:teuthology.task.ceph:Saw exception from mon.a
Traceback (most recent call last):
File "/home/elder/ceph/teuthology/teuthology/task/ceph.py", line 817, in run_daemon
daemon.stop()
File "/home/elder/ceph/teuthology/teuthology/task/ceph.py", line 40, in stop
run.wait([self.proc])
File "/home/elder/ceph/teuthology/teuthology/orchestra/run.py", line 272, in wait
proc.exitstatus.get()
File "/home/elder/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/gevent/event.py", line 223, in get
raise self._exception
CommandFailedError: Command failed with status 1: '/tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/daemon-helper term /tmp/cephtest/binary/usr/local/bin/ceph-mon -f -i a -c /tmp/cephtest/ceph.conf'
INFO:teuthology.task.ceph.mon.c.err:2012-02-21 22:14:42.332393 7fb5e458c700 mon.c@2(peon) e1 Got Signal Terminated *
INFO:teuthology.task.ceph.mon.c:Stopped
INFO:teuthology.task.ceph.mon.b.err:2012-02-21 22:14:42.428996 7f6586237700 mon.b@0(leader) e1 * Got Signal Terminated **
INFO:teuthology.task.ceph.mon.b:Stopped
INFO:teuthology.task.ceph:Grabbing cluster log from ubuntu@sepia77.ceph.dreamhost.com mon.a...
INFO:teuthology.task.ceph:Checking cluster ceph.log for badness...
INFO:teuthology.task.ceph:Unmounting /tmp/cephtest/data/osd.0.data on ubuntu@sepia77.ceph.dreamhost.com
INFO:teuthology.task.ceph:Unmounting /tmp/cephtest/data/osd.1.data on ubuntu@sepia70.ceph.dreamhost.com
INFO:teuthology.task.ceph:Cleaning ceph cluster...
INFO:teuthology.task.ceph:Removing ceph binaries...
INFO:teuthology.task.ceph:Removing shipped files: daemon-helper enable-coredump...
INFO:teuthology.task.ceph:Compressing logs...
ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x2868950>
Traceback (most recent call last):
File "/home/elder/ceph/teuthology/teuthology/run_tasks.py", line 45, in run_tasks
suppress = manager._exit(*exc_info)
File "/usr/lib/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/home/elder/ceph/teuthology/teuthology/task/ceph.py", line 1017, in task
yield
File "/usr/lib/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/home/elder/ceph/teuthology/teuthology/contextutil.py", line 35, in nested
if exit(*exc):
File "/usr/lib/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/home/elder/ceph/teuthology/teuthology/task/ceph.py", line 817, in run_daemon
daemon.stop()
File "/home/elder/ceph/teuthology/teuthology/task/ceph.py", line 40, in stop
run.wait([self.proc])
File "/home/elder/ceph/teuthology/teuthology/orchestra/run.py", line 272, in wait
proc.exitstatus.get()
File "/home/elder/ceph/teuthology/virtualenv/local/lib/python2.7/site-packages/gevent/event.py", line 223, in get
raise self._exception
CommandFailedError: Command failed with status 1: '/tmp/cephtest/enable-coredump /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/daemon-helper term /tmp/cephtest/binary/usr/local/bin/ceph-mon -f -i a -c /tmp/cephtest/ceph.conf'
INFO:teuthology.task.internal:Shutting down syslog monitoring...
INFO:teuthology.orchestra.run.out:rsyslog start/running, process 4452
INFO:teuthology.orchestra.run.out:rsyslog start/running, process 18961
INFO:teuthology.orchestra.run.out:rsyslog start/running, process 25670
INFO:teuthology.task.internal:Checking logs for errors...
INFO:teuthology.task.internal:Compressing syslogs...
INFO:teuthology.orchestra.run.out:kernel.core_pattern = core
INFO:teuthology.orchestra.run.out:kernel.core_pattern = core
INFO:teuthology.orchestra.run.out:kernel.core_pattern = core
WARNING:teuthology.task.internal:Found coredumps on ubuntu@sepia77.ceph.dreamhost.com, flagging run as failed
INFO:teuthology.task.internal:Transferring archived files...
INFO:teuthology.task.internal:Removing archive directory...
INFO:teuthology.task.internal:Tidying up after the test...
INFO:teuthology.run:Duration was 773.190049 seconds
Updated by Sage Weil about 12 years ago
- Category set to Monitor
- Status changed from New to 4
- Target version set to v0.43
wip-2090
Updated by Sage Weil about 12 years ago
(gdb) thr app all bt Thread 4 (Thread 25243): #0 __lll_unlock_wake () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:366 #1 0x00007f51e68f1d66 in _L_unlock_696 () from /lib/libpthread.so.0 #2 0x00007f51e68f1ca3 in __pthread_mutex_unlock_usercnt (mutex=0x1e62810) at pthread_mutex_unlock.c:52 #3 __pthread_mutex_unlock (mutex=0x1e62810) at pthread_mutex_unlock.c:290 #4 0x000000000047dc5a in Mutex::Unlock (this=0x1e62800) at ./common/Mutex.h:122 #5 0x00000000005487b0 in SimpleMessenger::mark_down_all (this=0x1e62680) at msg/SimpleMessenger.cc:2733 #6 0x000000000053b79d in SimpleMessenger::shutdown (this=0x1e62680) at msg/SimpleMessenger.cc:409 #7 0x0000000000469731 in Monitor::shutdown (this=0x1e80000) at mon/Monitor.cc:377 #8 0x0000000000471c21 in Monitor::handle_signal (this=0x1e80000, signum=<value optimized out>) at mon/Monitor.cc:224 #9 0x00000000005b3973 in SignalHandler::entry() () #10 0x00007f51e68ee971 in start_thread (arg=<value optimized out>) at pthread_create.c:304 #11 0x00007f51e517d92d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #12 0x0000000000000000 in ?? () Thread 3 (Thread 25240): #0 0x00007f51e5171203 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x0000000000523d2e in AdminSocket::entry (this=0x1e75000) at common/admin_socket.cc:211 #2 0x00007f51e68ee971 in start_thread (arg=<value optimized out>) at pthread_create.c:304 #3 0x00007f51e517d92d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 2 (Thread 25239): #0 sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:103 #1 0x00000000005a569f in CephContextServiceThread::entry (this=0x1e66c80) at common/ceph_context.cc:53 #2 0x00007f51e68ee971 in start_thread (arg=<value optimized out>) at pthread_create.c:304 #3 0x00007f51e517d92d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #4 0x0000000000000000 in ?? () Thread 1 (Thread 25238): #0 0x00007f51e68f6a0b in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42 #1 0x00000000005b2b6e in reraise_fatal (signum=6) at global/signal_handler.cc:59 #2 handle_fatal_signal (signum=6) at global/signal_handler.cc:95 #3 <signal handler called> #4 0x00007f51e50caba5 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #5 0x00007f51e50ce6b0 in abort () at abort.c:92 #6 0x00007f51e596e6bd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6 #7 0x00007f51e596c906 in ?? () from /usr/lib/libstdc++.so.6 #8 0x00007f51e596c933 in std::terminate() () from /usr/lib/libstdc++.so.6 #9 0x00007f51e596ca3e in __cxa_throw () from /usr/lib/libstdc++.so.6 #10 0x0000000000534871 in ceph::__ceph_assert_fail (assertion=<value optimized out>, file=<value optimized out>, line=<value optimized out>, func=<value optimized out>) at common/assert.cc:75 #11 0x0000000000476d2c in ~Mutex (this=0x1e80000, __in_chrg=<value optimized out>) at ./common/Mutex.h:89 #12 Monitor::~Monitor (this=0x1e80000, __in_chrg=<value optimized out>) at mon/Monitor.cc:188 #13 0x0000000000461a1e in main (argc=<value optimized out>, argv=<value optimized out>) at ceph_mon.cc:420
Updated by Sage Weil about 12 years ago
- Status changed from 4 to Resolved