Project

General

Profile

Actions

Bug #501

closed

unexpected lockdep crash during vstart.sh

Added by Colin McCabe over 13 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I was running the unstable branch, at commit 1190313ae954f12f9b5bc364e1226d6d2440880c.
To test, I was running "vstart.sh -d -n"

During startup, my third cmon process (cmon.c) hit a lockdep assert.

In out/mon.c:

2010-10-18 10:44:54.911061 7f73ea07f710 -- 10.3.14.10:6791/0 --> mon0 10.3.14.10:6789/0 -- election(ack 1) v1 -- ?+0 0x164b700
2010-10-18 10:44:54.916588 7f73e8e7b710 mon.c@2(starting) e1 ms_verify_authorizer 10.3.14.10:6790/0 mon protocol 0
2010-10-18 10:44:54.916653 7f73e8e7b710 -- 10.3.14.10:6791/0 >> 10.3.14.10:6790/0 pipe(0x1645570 sd=8 pgs=0 cs=0 l=0).accept connect_seq 0 vs existing 0 state 6

2010-10-18 10:44:54.922089 7f73e8e7b710 lockdep: recursive lock of SimpleMessenger::Pipe::pipe_lock (6)
ceph version 0.23~rc (1190313ae954f12f9b5bc364e1226d6d2440880c)
1: (Mutex::Lock(bool)+0x3a) [0x5d47fa]
2: (SimpleMessenger::Pipe::accept()+0x2096) [0x5e05e8]
3: (SimpleMessenger::Pipe::reader()+0x36) [0x5e0770]
4: (SimpleMessenger::Pipe::Reader::entry()+0x19) [0x5d2fcb]
5: (Thread::_entry_func(void*)+0x20) [0x5e407c]
6: (()+0x68ba) [0x7f73ec3518ba]
7: (clone()+0x6d) [0x7f73eb56c02d]

common/lockdep.cc: In function 'int lockdep_will_lock(const char*, int)':
common/lockdep.cc:153: FAILED assert(0)
ceph version 0.23~rc (1190313ae954f12f9b5bc364e1226d6d2440880c)
1: (lockdep_will_lock(char const*, int)+0x3b3) [0x71dfd5]
2: (Mutex::_will_lock()+0x1f) [0x5d477f]
3: (Mutex::Lock(bool)+0x3a) [0x5d47fa]
4: (SimpleMessenger::Pipe::accept()+0x2096) [0x5e05e8]
5: (SimpleMessenger::Pipe::reader()+0x36) [0x5e0770]
6: (SimpleMessenger::Pipe::Reader::entry()+0x19) [0x5d2fcb]
7: (Thread::_entry_func(void*)+0x20) [0x5e407c]
8: (()+0x68ba) [0x7f73ec3518ba]
9: (clone()+0x6d) [0x7f73eb56c02d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
  • Caught signal (ABRT) **
    ceph version 0.23~rc (1190313ae954f12f9b5bc364e1226d6d2440880c)
    1: (ceph::BackTrace::BackTrace(int)+0x2a) [0x7066f4]
    2: (sigabrt_handler(int)+0x41) [0x719acb]
    3: (()+0x321e0) [0x7f73eb4cf1e0]
    4: (gsignal()+0x35) [0x7f73eb4cf165]
    5: (abort()+0x180) [0x7f73eb4d1f70]
    6: (_gnu_cxx::_verbose_terminate_handler()+0x115) [0x7f73ebd62dc5]
    7: (()+0xcb166) [0x7f73ebd61166]
    8: (()+0xcb193) [0x7f73ebd61193]
    9: (()+0xcb28e) [0x7f73ebd6128e]
    10: (ceph::__ceph_assert_fail(char const
    , char const*, int, char const*)+0x217) [0x7065f1]
    11: (lockdep_will_lock(char const*, int)+0x3b3) [0x71dfd5]
    12: (Mutex::_will_lock()+0x1f) [0x5d477f]
    13: (Mutex::Lock(bool)+0x3a) [0x5d47fa]
    14: (SimpleMessenger::Pipe::accept()+0x2096) [0x5e05e8]
    15: (SimpleMessenger::Pipe::reader()+0x36) [0x5e0770]
    16: (SimpleMessenger::Pipe::Reader::entry()+0x19) [0x5d2fcb]
    17: (Thread::_entry_func(void*)+0x20) [0x5e407c]
    18: (()+0x68ba) [0x7f73ec3518ba]
    19: (clone()+0x6d) [0x7f73eb56c02d]

Inspection of the core file revealed:

#0 0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000000000719aed in sigabrt_handler (signum=6) at config.cc:238
#2 <signal handler called>
#3 0x00007f73eb4cf165 in _GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#4 0x00007f73eb4d1f70 in *
_GI_abort () at abort.c:92
#5 0x00007f73ebd62dc5 in _gnu_cxx::_verbose_terminate_handler () from /usr/lib/libstdc++.so.6
#6 0x00007f73ebd61166 in ?? () from /usr/lib/libstdc++.so.6
#7 0x00007f73ebd61193 in std::terminate () from /usr/lib/libstdc++.so.6
#8 0x00007f73ebd6128e in _cxa_throw () from /usr/lib/libstdc++.so.6
#9 0x00000000007065f1 in ceph::
_ceph_assert_fail (assertion=0x75715e "0", file=0x757049 "common/lockdep.cc", line=153, func=0x757300 "int lockdep_will_lock(const char
, int)")
at common/assert.cc:30
#10 0x000000000071dfd5 in lockdep_will_lock (name=0x738b70 "SimpleMessenger::Pipe::pipe_lock", id=6) at common/lockdep.cc:153
#11 0x00000000005d477f in Mutex::_will_lock (this=0x1645640) at ./common/Mutex.h:46
#12 0x00000000005d47fa in Mutex::Lock (this=0x1645640, no_lockdep=false) at ./common/Mutex.h:94
#13 0x00000000005e05e8 in SimpleMessenger::Pipe::accept (this=0x1645570) at msg/SimpleMessenger.cc:891
#14 0x00000000005e0770 in SimpleMessenger::Pipe::reader (this=0x1645570) at msg/SimpleMessenger.cc:1442
#15 0x00000000005d2fcb in SimpleMessenger::Pipe::Reader::entry (this=0x16457a8) at ./msg/SimpleMessenger.h:192
#16 0x00000000005e407c in Thread::_entry_func (arg=0x16457a8) at common/Thread.h:39
#17 0x00007f73ec3518ba in start_thread (arg=<value optimized out>) at pthread_create.c:300
#18 0x00007f73eb56c02d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#19 0x0000000000000000 in ?? ()

Actions #1

Updated by Greg Farnum over 13 years ago

  • Status changed from New to Resolved
  • Assignee set to Greg Farnum

Looks like it's caused by trying to pipe_lock.Lock() while holding existing->pipe_lock.
Should be fixed in commit:4b84f9e79e83be22565e68b7ea758414d8140d52

Actions #2

Updated by Colin McCabe over 13 years ago

I applied the fix, but then I got a different crash in cmon:

#0 0x0000000000000000 in ?? ()
#1 0x0000000000719bd1 in sigabrt_handler (signum=6) at config.cc:240
#2 <signal handler called>
#3 0x00007fd670387165 in *_GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#4 0x00007fd670389f70 in *
_GI_abort () at abort.c:92
#5 0x00007fd670c1adc5 in _gnu_cxx::_verbose_terminate_handler () from /usr/lib/libstdc++.so.6
#6 0x00007fd670c19166 in ?? () from /usr/lib/libstdc++.so.6
#7 0x00007fd670c19193 in std::terminate () from /usr/lib/libstdc++.so.6
#8 0x00007fd670c1928e in _cxa_throw () from /usr/lib/libstdc++.so.6
#9 0x0000000000706691 in ceph::
_ceph_assert_fail (assertion=0x73a037 "nlock > 0", file=0x73a075 "./common/Mutex.h", line=102, func=0x73c060 "void Mutex::Unlock()") at common/assert.cc:30
#10 0x00000000005e4b1a in Mutex::Unlock (this=0x7fd6680009c0) at common/Mutex.h:102
#11 0x00000000005e0667 in SimpleMessenger::Pipe::accept (this=0x1421570) at msg/SimpleMessenger.cc:892
#12 0x00000000005e0806 in SimpleMessenger::Pipe::reader (this=0x1421570) at msg/SimpleMessenger.cc:1444
#13 0x00000000005d304b in SimpleMessenger::Pipe::Reader::entry (this=0x14217a8) at ./msg/SimpleMessenger.h:192
#14 0x00000000005e4112 in Thread::_entry_func (arg=0x14217a8) at common/Thread.h:39
#15 0x00007fd6712098ba in start_thread (arg=<value optimized out>) at pthread_create.c:300
#16 0x00007fd67042402d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#17 0x0000000000000000 in ?? ()

Actions #3

Updated by Colin McCabe over 13 years ago

I believe that the second crash I saw should be fixed by dac9ecd0e05f75744fd0f10ae51ec1d92e9931c1.

Resolved.

Actions

Also available in: Atom PDF