Actions
Bug #3678
closedosd: tcmalloc segfault in PG::CephPeeringEvt::CephPeeringEvt<PG::MNotifyRec>()
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
-35> 2012-12-24 11:54:50.445332 7fca638d1700 -1 *** Caught signal (Segmentation fault) ** in thread 7fca638d1700 ceph version 0.55.1-357-ga3aa00d (a3aa00d149829a688d5c2a7be53d07a482d8fc2a) 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x1068a37] 2: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x10cdf16] 3: (()+0xfcb0) [0x7fca70a70cb0] 4: (tcmalloc::CentralFreeList::FetchFromSpans()+0x27) [0x7fca6fb16df7] 5: (tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+0x107) [0x7fca6fb17167] 6: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)+0x5d) [0x7fca6fb19cad] 7: (tc_new()+0x486) [0x7fca6fb28c76] 8: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)+0x59) [0x7fca6f46ca89] 9: (char* std::string::_S_construct<char*>(char*, char*, std::allocator<char> const&, std::forward_iterator_tag)+0xe6) [0x118953a] 10: (char* std::string::_S_construct_aux<char*>(char*, char*, std::allocator<char> const&, std::__false_type)+0x2f) [0x1189451] 11: (char* std::string::_S_construct<char*>(char*, char*, std::allocator<char> const&)+0x2f) [0x1189420] 12: (std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string<char*>(char*, char*, std::allocator<char> const&)+0x2f) [0x11893d9] 13: (std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::str() const+0x78) [0x7fca6f468828] 14: (PG::CephPeeringEvt::CephPeeringEvt<PG::MNotifyRec>(unsigned int, unsigned int, PG::MNotifyRec const&)+0x11c) [0xef1d80] 15: (PG::queue_notify(unsigned int, unsigned int, int, pg_notify_t&)+0x17a) [0xfb9906] 16: (OSD::handle_pg_notify(std::tr1::shared_ptr<OpRequest>)+0x632) [0xec8472] 17: (OSD::dispatch_op(std::tr1::shared_ptr<OpRequest>)+0xc5) [0xeb6233] 18: (OSD::_dispatch(Message*)+0x81b) [0xeb6f11] 19: (OSD::ms_dispatch(Message*)+0x1b1) [0xeb50cd] 20: (Messenger::ms_deliver_dispatch(Message*)+0x9b) [0x1271a49] 21: (DispatchQueue::entry()+0x549) [0x12711b1] 22: (DispatchQueue::DispatchThread::entry()+0x1c) [0x1191d6e] 23: (Thread::_entry_func(void*)+0x23) [0x1199aed] 24: (()+0x7e9a) [0x7fca70a68e9a] 25: (clone()+0x6d) [0x7fca6ebf34bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
on ubuntu@teuthology:/a/sage-peer3/25356.
full osd and ms log!
Updated by Sage Weil over 11 years ago
another one:
Thread 1 (Thread 0x7f970b09a700 (LWP 29228)): #0 0x00007f971ed028ee in tc_new () from /usr/lib/libtcmalloc.so.0 #1 0x00007f971e646a89 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #2 0x00000000007cf113 in _S_construct<char*> (__a=..., __end=<optimized out>, __beg=0x2d31b58 "ceph version 0.55.1-384-gd9673ca (d9673ca3248d729c8e4f3c47db7a9ebb07ee97fd)l-les=172 n=0 ec=1 les/c 172/153 171/171/152)\n") at /usr/include/c++/4.6/bits/basic_string.tcc:140 #3 _S_construct_aux<char*> (__a=..., __end=<optimized out>, __beg=0x2d31b58 "ceph version 0.55.1-384-gd9673ca (d9673ca3248d729c8e4f3c47db7a9ebb07ee97fd)l-les=172 n=0 ec=1 les/c 172/153 171/171/152)\n") at /usr/include/c++/4.6/bits/basic_string.h:1692 #4 _S_construct<char*> (__a=..., __end=<optimized out>, __beg=0x2d31b58 "ceph version 0.55.1-384-gd9673ca (d9673ca3248d729c8e4f3c47db7a9ebb07ee97fd)l-les=172 n=0 ec=1 les/c 172/153 171/171/152)\n") at /usr/include/c++/4.6/bits/basic_string.h:1713 #5 std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string<char*> (this=0x7f970b0983d0, __beg=0x2d31b58 "ceph version 0.55.1-384-gd9673ca (d9673ca3248d729c8e4f3c47db7a9ebb07ee97fd)l-les=172 n=0 ec=1 les/c 172/153 171/171/152)\n", __end=<optimized out>, __a=...) at /usr/include/c++/4.6/bits/basic_string.tcc:231 #6 0x0000000000a2a7c5 in str (this=0x7f970b098268) at /usr/include/c++/4.6/sstream:129 #7 str (this=0x7f970b098260) at /usr/include/c++/4.6/sstream:458 #8 pretty_version_to_str () at common/version.cc:40 #9 0x000000000089d428 in ceph::BackTrace::print (this=0x7f970b0984c0, out=...) at common/BackTrace.cc:19 #10 0x0000000000899d53 in handle_fatal_signal (signum=11) at global/signal_handler.cc:91 #11 <signal handler called> #12 0x00007f971ed028ee in tc_new () from /usr/lib/libtcmalloc.so.0 #13 0x00007f971e646a89 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #14 0x00007f971e6477fb in std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #15 0x00007f971e6478e0 in std::string::reserve(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #16 0x00007f971e647a3f in std::string::append(unsigned long, char) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #17 0x0000000000955144 in resize (__n=<optimized out>, this=0x3bda4d0) at /usr/include/c++/4.6/bits/basic_string.h:749 #18 PrebufferedStreambuf::overflow (this=<optimized out>, c=32) at common/PrebufferedStreambuf.cc:18 #19 0x00007f971e645085 in std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #20 0x00007f971e63c055 in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #21 0x00007f971e63c2ef in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #22 0x0000000000a5b4cd in Pipe::_pipe_prefix (this=0x2fa66c0, _dout=<optimized out>) at msg/Pipe.cc:43 #23 0x0000000000a693b2 in Pipe::fault (this=0x2fa66c0, onread=<optimized out>) at msg/Pipe.cc:1150 #24 0x0000000000a7b28d in Pipe::reader (this=0x2fa66c0) at msg/Pipe.cc:1244 #25 0x0000000000a7e785 in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47 #26 0x00000000009665e2 in Thread::_entry_func (arg=<optimized out>) at common/Thread.cc:41 #27 0x00007f971fc42e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #28 0x00007f971ddcd4bd in klogctl () from /lib/x86_64-linux-gnu/libc.so.6 #29 0x0000000000000000 in ?? () (gdb) (gdb) (gdb) t 51 [Switching to thread 51 (Thread 0x7f970a38d700 (LWP 28092))] #0 0x00007f971ddc1b03 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt #0 0x00007f971ddc1b03 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f971ecf3e77 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /usr/lib/libtcmalloc.so.0 #2 0x00007f971fc46104 in _L_unlock_644 () from /lib/x86_64-linux-gnu/libpthread.so.0 #3 0x00007f971fc46063 in pthread_mutex_unlock () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00000000026d1d80 in ?? () #5 0x00000000026d1e90 in ?? () #6 0x0000000002a3b8c0 in ?? () #7 0x0000000002d51000 in ?? () #8 0x0000000000a7ac02 in Pipe::reader (this=0x20) at msg/Pipe.cc:1241 #9 0x0000000000a7e785 in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47 #10 0x00000000009665e2 in Thread::_entry_func (arg=<optimized out>) at common/Thread.cc:41 #11 0x00007f971fc42e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #12 0x00007f971ddcd4bd in klogctl () from /lib/x86_64-linux-gnu/libc.so.6 #13 0x0000000000000000 in ?? ()
ubuntu@teuthology:/a/teuthology-2012-12-25_03:00:02-regression-master-testing-gcov/27524
job was
kernel: kdb: true sha1: ec18aeecd4de479601363849d489668d8f12410c nuke-on-error: true overrides: ceph: conf: client: rbd cache: false global: ms inject socket failures: 5000 coverage: true fs: ext4 log-whitelist: - slow request sha1: d9673ca3248d729c8e4f3c47db7a9ebb07ee97fd s3tests: branch: master workunit: sha1: d9673ca3248d729c8e4f3c47db7a9ebb07ee97fd roles: - - mon.a - osd.0 - osd.1 - osd.2 - - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rbd_fsx: clients: - client.0 ops: 2000
Updated by Sage Weil over 11 years ago
- Priority changed from High to Urgent
another run:
(gdb) info th Id Target Id Frame 52 Thread 0x7f70b612e700 (LWP 23111) "ceph-osd" 0x00007f70b73699e3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.0 51 Thread 0x7f70b592d700 (LWP 23113) "ceph-osd" 0x00007f70b73699e3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.0 50 Thread 0x7f70b512c700 (LWP 23114) "ceph-osd" 0x00007f70b642eb03 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 49 Thread 0x7f70b0122700 (LWP 23153) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 48 Thread 0x7f70b0923700 (LWP 23154) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 47 Thread 0x7f70b1124700 (LWP 23155) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 46 Thread 0x7f70b3929700 (LWP 23156) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 45 Thread 0x7f70b492b700 (LWP 23157) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 44 Thread 0x7f70b412a700 (LWP 23158) "ceph-osd" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 43 Thread 0x7f70b3128700 (LWP 23159) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 42 Thread 0x7f70b2927700 (LWP 23205) "ceph-osd" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 41 Thread 0x7f70b2126700 (LWP 23206) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 40 Thread 0x7f70b1925700 (LWP 23207) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 39 Thread 0x7f70af921700 (LWP 23208) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 38 Thread 0x7f70af120700 (LWP 23209) "ceph-osd" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 37 Thread 0x7f70ae91f700 (LWP 23210) "ceph-osd" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 36 Thread 0x7f70ae11e700 (LWP 23211) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 35 Thread 0x7f70ad91d700 (LWP 23212) "ceph-osd" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 34 Thread 0x7f70ad11c700 (LWP 23213) "ceph-osd" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 33 Thread 0x7f70ac91b700 (LWP 23214) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 32 Thread 0x7f70ac11a700 (LWP 23215) "ceph-osd" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 31 Thread 0x7f70ab919700 (LWP 23216) "ceph-osd" 0x00007f70b642eb03 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 30 Thread 0x7f70ab118700 (LWP 23217) "ceph-osd" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 29 Thread 0x7f70aa917700 (LWP 23218) "ceph-osd" 0x00007f70b642eb03 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 28 Thread 0x7f70aa116700 (LWP 23219) "ceph-osd" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 27 Thread 0x7f70a9915700 (LWP 23220) "ceph-osd" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 26 Thread 0x7f70a9114700 (LWP 23221) "ceph-osd" 0x00007f70b642eb03 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 25 Thread 0x7f70a8913700 (LWP 23222) "ceph-osd" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 24 Thread 0x7f70a8112700 (LWP 23223) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 23 Thread 0x7f70a7911700 (LWP 23224) "ceph-osd" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 22 Thread 0x7f70a7110700 (LWP 23225) "ceph-osd" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 21 Thread 0x7f70a690f700 (LWP 23226) "ceph-osd" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 20 Thread 0x7f70a610e700 (LWP 23227) "ceph-osd" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 19 Thread 0x7f70a590d700 (LWP 23228) "ceph-osd" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215 18 Thread 0x7f70a510c700 (LWP 23229) "ceph-osd" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 17 Thread 0x7f70a490b700 (LWP 23230) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 16 Thread 0x7f70a410a700 (LWP 23231) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 15 Thread 0x7f70a3808700 (LWP 23268) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 14 Thread 0x7f70a3404700 (LWP 23284) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 13 Thread 0x7f70a3909700 (LWP 23346) "ceph-osd" __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132 12 Thread 0x7f70b88bb700 (LWP 23347) "ceph-osd" 0x00007f70b73699e3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.0 11 Thread 0x7f70a2bfc700 (LWP 23392) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 10 Thread 0x7f70a29fa700 (LWP 23539) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 9 Thread 0x7f70a3202700 (LWP 24062) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 8 Thread 0x7f70a3606700 (LWP 24513) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 7 Thread 0x7f70a3303700 (LWP 24679) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 6 Thread 0x7f70a22f3700 (LWP 24707) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 5 Thread 0x7f70a1bec700 (LWP 24768) "ceph-osd" 0x00007f70b73699e3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.0 4 Thread 0x7f70a1aeb700 (LWP 24769) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 3 Thread 0x7f70a24f5700 (LWP 24794) "ceph-osd" 0x00007f70b73699e3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.0 2 Thread 0x7f70a1eef700 (LWP 24797) "ceph-osd" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 * 1 Thread 0x7f70b88d5780 (LWP 23104) "ceph-osd" 0x00007f70b82b1148 in pthread_join (threadid=140121899902720, thread_return=0x0) at pthread_join.c:89 (gdb) f 3 #3 0x000000000095c120 in SimpleMessenger::wait (this=0x24c9000) at msg/SimpleMessenger.cc:490 490 msg/SimpleMessenger.cc: No such file or directory. (gdb) t 3 [Switching to thread 3 (Thread 0x7f70a24f5700 (LWP 24794))] #0 0x00007f70b73699e3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.0 (gdb) bt #0 0x00007f70b73699e3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.0 #1 0x00007f70b73698d2 in SpinLock::SlowLock() () from /usr/lib/libtcmalloc.so.0 #2 0x00007f70b735e0b4 in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /usr/lib/libtcmalloc.so.0 #3 0x00007f70b7360cad in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /usr/lib/libtcmalloc.so.0 #4 0x00007f70b736fc76 in tc_new () from /usr/lib/libtcmalloc.so.0 #5 0x00007f70b6cb3a89 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #6 0x00007f70b6cb47fb in std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #7 0x00007f70b6cb48e0 in std::string::reserve(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #8 0x00007f70b6cb4a3f in std::string::append(unsigned long, char) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #9 0x0000000000955295 in resize (__n=160, this=0x29cbcf0) at /usr/include/c++/4.6/bits/basic_string.h:749 #10 PrebufferedStreambuf::overflow (this=0x29cbca0, c=116) at common/PrebufferedStreambuf.cc:20 #11 0x00007f70b6cb2085 in std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #12 0x00007f70b6ca9055 in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #13 0x000000000089d8a4 in operator<< <std::char_traits<char> > (__s=0x2a084a0 "/tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x899cfa]", __out=...) at /usr/include/c++/4.6/ostream:515 #14 ceph::BackTrace::print (this=0x7f70a24f3640, out=...) at common/BackTrace.cc:60 #15 0x0000000000899e56 in handle_fatal_signal (signum=11) at global/signal_handler.cc:97 #16 <signal handler called> #17 0x00007f70b735ddf7 in tcmalloc::CentralFreeList::FetchFromSpans() () from /usr/lib/libtcmalloc.so.0 #18 0x00007f70b735e167 in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /usr/lib/libtcmalloc.so.0 #19 0x00007f70b7360cad in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /usr/lib/libtcmalloc.so.0 #20 0x00007f70b736fc76 in tc_new () from /usr/lib/libtcmalloc.so.0 #21 0x00007f70b6cb3a89 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #22 0x00007f70b6cb47fb in std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #23 0x00007f70b6cb48e0 in std::string::reserve(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #24 0x00007f70b6cb4a3f in std::string::append(unsigned long, char) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #25 0x0000000000955295 in resize (__n=160, this=0x26be670) at /usr/include/c++/4.6/bits/basic_string.h:749 #26 PrebufferedStreambuf::overflow (this=0x26be620, c=111) at common/PrebufferedStreambuf.cc:20 #27 0x00007f70b6cb2085 in std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #28 0x00007f70b6ca9055 in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #29 0x00007f70b6ca92ef in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #30 0x00000000006b9445 in MOSDPGLog::print (this=0x4c6d000, out=...) at ./messages/MOSDPGLog.h:57 #31 0x0000000000a7c5c2 in operator<< (m=..., out=...) at msg/Message.h:480 #32 Pipe::reader (this=0x47e0000) at msg/Pipe.cc:1296 #33 0x0000000000a7e785 in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47 #34 0x00000000009665e2 in Thread::_entry_func (arg=<optimized out>) at common/Thread.cc:41 #35 0x00007f70b82afe9a in start_thread (arg=0x7f70a24f5700) at pthread_create.c:308 #36 0x00007f70b643a4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #37 0x0000000000000000 in ?? () (gdb) t 3 [Switching to thread 3 (Thread 0x7f70a24f5700 (LWP 24794))] #0 0x00007f70b73699e3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.0 (gdb) t 5 [Switching to thread 5 (Thread 0x7f70a1bec700 (LWP 24768))] #0 0x00007f70b73699e3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.0 (gdb) bt #0 0x00007f70b73699e3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.0 #1 0x00007f70b73698d2 in SpinLock::SlowLock() () from /usr/lib/libtcmalloc.so.0 #2 0x00007f70b735e0b4 in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /usr/lib/libtcmalloc.so.0 #3 0x00007f70b7360cad in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /usr/lib/libtcmalloc.so.0 #4 0x00007f70b7370b26 in tc_newarray () from /usr/lib/libtcmalloc.so.0 #5 0x0000000000982457 in raw_char (l=177, this=<optimized out>) at common/buffer.cc:196 #6 ceph::buffer::create (len=177) at common/buffer.cc:231 #7 0x0000000000a6210b in Pipe::read_message (this=0x25f3b40, pm=0x7f70a1bebda0) at msg/Pipe.cc:1580 #8 0x0000000000a7b4aa in Pipe::reader (this=0x25f3b40) at msg/Pipe.cc:1272 #9 0x0000000000a7e785 in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:47 #10 0x00000000009665e2 in Thread::_entry_func (arg=<optimized out>) at common/Thread.cc:41 #11 0x00007f70b82afe9a in start_thread (arg=0x7f70a1bec700) at pthread_create.c:308 #12 0x00007f70b643a4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #13 0x0000000000000000 in ?? () (gdb) t 51 [Switching to thread 51 (Thread 0x7f70b592d700 (LWP 23113))] #0 0x00007f70b73699e3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.0 (gdb) bt #0 0x00007f70b73699e3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.0 #1 0x00007f70b73698d2 in SpinLock::SlowLock() () from /usr/lib/libtcmalloc.so.0 #2 0x00007f70b735e0b4 in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /usr/lib/libtcmalloc.so.0 #3 0x00007f70b7360cad in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) () from /usr/lib/libtcmalloc.so.0 #4 0x00007f70b736fc76 in tc_new () from /usr/lib/libtcmalloc.so.0 #5 0x00007f70b6cb3a89 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #6 0x00007f70b6cb47fb in std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #7 0x00007f70b6cb48e0 in std::string::reserve(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #8 0x00007f70b6cb4a3f in std::string::append(unsigned long, char) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #9 0x0000000000955295 in resize (__n=160, this=0x297c670) at /usr/include/c++/4.6/bits/basic_string.h:749 #10 PrebufferedStreambuf::overflow (this=0x297c620, c=52) at common/PrebufferedStreambuf.cc:20 #11 0x00007f70b6cb2085 in std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #12 0x00007f70b6ca9055 in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #13 0x00007f70b6ca92ef in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #14 0x0000000000981e45 in ceph::__ceph_assert_fail (assertion=0xb69f78 "0 == \"hit suicide timeout\"", file=<optimized out>, line=78, func=0xb6a100 "bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)") at common/assert.cc:69 #15 0x00000000008b9e15 in ceph::HeartbeatMap::_check (this=<optimized out>, h=<optimized out>, who=0xb69fed "is_healthy", now=<optimized out>) at common/HeartbeatMap.cc:78 #16 0x00000000008baf3f in ceph::HeartbeatMap::is_healthy (this=0x24ae0e0) at common/HeartbeatMap.cc:118 #17 0x00000000008bb34b in ceph::HeartbeatMap::check_touch_file (this=0x24ae0e0) at common/HeartbeatMap.cc:129 #18 0x0000000000994b9d in CephContextServiceThread::entry (this=0x24ac1c0) at common/ceph_context.cc:68 #19 0x00000000009665e2 in Thread::_entry_func (arg=<optimized out>) at common/Thread.cc:41 #20 0x00007f70b82afe9a in start_thread (arg=0x7f70b592d700) at pthread_create.c:308 #21 0x00007f70b643a4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #22 0x0000000000000000 in ?? () (gdb) t 52 [Switching to thread 52 (Thread 0x7f70b612e700 (LWP 23111))] #0 0x00007f70b73699e3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.0 (gdb) bt #0 0x00007f70b73699e3 in base::internal::SpinLockDelay(int volatile*, int, int) () from /usr/lib/libtcmalloc.so.0 #1 0x00007f70b73698d2 in SpinLock::SlowLock() () from /usr/lib/libtcmalloc.so.0 #2 0x00007f70b735dd27 in tcmalloc::CentralFreeList::InsertRange(void*, void*, int) () from /usr/lib/libtcmalloc.so.0 #3 0x00007f70b7360e77 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /usr/lib/libtcmalloc.so.0 #4 0x00007f70b7360ec3 in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long) () from /usr/lib/libtcmalloc.so.0 #5 0x00007f70b73703ab in tc_delete () from /usr/lib/libtcmalloc.so.0 #6 0x00000000008c4397 in _M_dispose (__a=..., this=<optimized out>) at /usr/include/c++/4.6/bits/basic_string.h:244 #7 ~basic_string (this=0x29cfe90, __in_chrg=<optimized out>) at /usr/include/c++/4.6/bits/basic_string.h:534 #8 ~PrebufferedStreambuf (this=0x29cfe40, __in_chrg=<optimized out>) at ./common/PrebufferedStreambuf.h:17 #9 ~Entry (this=0x29cfdd0, __in_chrg=<optimized out>) at log/Entry.h:17 #10 ceph::log::Log::flush (this=0x24b3000) at log/Log.cc:178 #11 0x00000000008c4488 in ceph::log::Log::entry (this=0x24b3000) at log/Log.cc:313 #12 0x00000000009665e2 in Thread::_entry_func (arg=<optimized out>) at common/Thread.cc:41 #13 0x00007f70b82afe9a in start_thread (arg=0x7f70b612e700) at pthread_create.c:308 #14 0x00007f70b643a4bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #15 0x0000000000000000 in ?? ()
job
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2012-12-25_03:00:02-regression-master-testing-gcov/27526$ cat orig.config.yaml kernel: kdb: true sha1: ec18aeecd4de479601363849d489668d8f12410c nuke-on-error: true overrides: ceph: conf: client: rbd cache: true global: ms inject socket failures: 5000 coverage: true fs: xfs log-whitelist: - slow request sha1: d9673ca3248d729c8e4f3c47db7a9ebb07ee97fd s3tests: branch: master workunit: sha1: d9673ca3248d729c8e4f3c47db7a9ebb07ee97fd roles: - - mon.a - osd.0 - osd.1 - osd.2 - - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rbd_fsx: clients: - client.0 ops: 2000
Updated by Sage Weil over 11 years ago
this reproduced once out of ~60 runs on the fsx task.
ubuntu@teuthology:/a/sage-tc1/28508 2012-12-27 17:04:39.112558 7f9fdd525700 -1 *** Caught signal (Segmentation fault) ** in thread 7f9fdd525700 ceph version 0.55.1-360-g6356739 (635673928a6b4dae6d4712cacad81cbac6412dc3) 1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x89b36a] 2: (()+0xfcb0) [0x7f9fe96c2cb0] 3: (tc_new()+0xfe) [0x7f9fe877a8ee] 4: (std::vector<osd_xinfo_t, std::allocator<osd_xinfo_t> >::_M_fill_insert(__gnu_cxx::__normal_iterator<osd_xinfo_t*, std::vector<osd_xinfo_t, std::allocator<osd_xinfo_t> > >, unsigned long, osd_xinfo_t const&)+0x370) [0x94f6c0] 5: (OSDMap::decode(ceph::buffer::list::iterator&)+0x13fe) [0x944fbe] 6: (OSDMap::decode(ceph::buffer::list&)+0x4e) [0x94546e] 7: (OSD::handle_osd_map(MOSDMap*)+0x1e0a) [0x6b319a] 8: (OSD::_dispatch(Message*)+0x3d3) [0x6b67e3] 9: (OSD::ms_dispatch(Message*)+0x288) [0x6b73a8] 10: (Messenger::ms_deliver_dispatch(Message*)+0x86) [0xa52596] 11: (DispatchQueue::entry()+0x270) [0xa50420] 12: (DispatchQueue::DispatchThread::entry()+0x15) [0x964515] 13: (Thread::_entry_func(void*)+0x12) [0x967b12] 14: (()+0x7e9a) [0x7f9fe96bae9a] 15: (clone()+0x6d) [0x7f9fe78454bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
will see if i can trigger it on some older code.
Updated by Josh Durgin over 11 years ago
Possibly the same bug in teuthology:/a/joshd-3631-12-28-12_08.55/30739
ceph version 0.55.1-369-g942c714 (942c71454b11d3bfb09ee4abe65f1e40936d3bf6) 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x1069a83] 2: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x10cef66] 3: (()+0xfcb0) [0x7f64ed407cb0] 4: (tcmalloc::CentralFreeList::FetchFromSpans()+0x27) [0x7f64ec4addf7] 5: (tcmalloc::CentralFreeList::FetchFromSpansSafe()+0x9) [0x7f64ec4ae039] 6: (tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+0x77) [0x7f64ec4ae0d7] 7: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long)+0x5d) [0x7f64ec4b0cad] 8: (tc_new()+0x486) [0x7f64ec4bfc76] 9: (__gnu_cxx::new_allocator<std::pair<pg_notify_t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > > >::allocate(unsigned lo ng, void const*)+0x43) [0xf1d5d1] 10: (std::_Vector_base<std::pair<pg_notify_t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > >, std::allocator<std::pair<pg_ notify_t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > > > >::_M_allocate(unsigned long)+0x2f) [0xf09223] 11: (std::vector<std::pair<pg_notify_t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > >, std::allocator<std::pair<pg_notify _t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > > > >::_M_fill_insert(__gnu_cxx::__normal_iterator<std::pair<pg_notify_t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > >*, std::vector<std::pair<pg_notify_t, std::map<unsigned int, pg_interval_t, s td::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > >, std::allocator<std::pair<pg_notify_t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocato r<std::pair<unsigned int const, pg_interval_t> > > > > > >, unsigned long, std::pair<pg_notify_t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > > const&)+0x355) [0xf0ef09] 12: (std::vector<std::pair<pg_notify_t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > >, std::allocator<std::pair<pg_notify _t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > > > >::insert(__gnu_cxx::__normal_iterator<std::pair<pg_notify_t, std::map <unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > >*, std::vector<std::pair<pg_notify_t, std::map<unsigned int, pg_interval_t, std::less <unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > >, std::allocator<std::pair<pg_notify_t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::p air<unsigned int const, pg_interval_t> > > > > > >, unsigned long, std::pair<pg_notify_t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_inte rval_t> > > > const&)+0x30) [0xefc2e0] 13: (std::vector<std::pair<pg_notify_t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > >, std::allocator<std::pair<pg_notify _t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > > > >::resize(unsigned long, std::pair<pg_notify_t, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > > >)+0x67) [0xeec685] 14: (MOSDPGInfo::decode_payload()+0x91) [0xee3539] 15: (decode_message(CephContext*, ceph_msg_header&, ceph_msg_footer&, ceph::buffer::list&, ceph::buffer::list&, ceph::buffer::list&)+0x1d9f) [0x10f1943] 16: (Pipe::read_message(Message**)+0x1c7c) [0x128eaf6] 17: (Pipe::reader()+0xa24) [0x128aa52] 18: (Pipe::Reader::entry()+0x1c) [0x1291b22] 19: (Thread::_entry_func(void*)+0x23) [0x119b399] 20: (()+0x7e9a) [0x7f64ed3ffe9a] 21: (clone()+0x6d) [0x7f64eb58a4bd]
Job was:
nuke-on-error: true overrides: ceph: branch: next conf: client: debug ms: 1 global: ms inject socket failures: 5000 osd: debug ms: 1 fs: xfs log-whitelist: - slow request roles: - - mon.a - osd.0 - osd.1 - osd.2 - - mds.a - osd.3 - osd.4 - osd.5 - - client.0 tasks: - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rados: clients: - client.0 objects: 50 op_weights: delete: 50 read: 100 snap_create: 50 snap_remove: 50 snap_rollback: 50 write: 100 ops: 4000
I'm adding debug osd = 20 to future runs.
Updated by Sage Weil over 11 years ago
another one with full osd logs:
ubuntu@teuthology:/a/sage-a1/33277$ zcat remote/*/log/osd.1* | tail -n 50 2013-01-02 00:39:56.943383 7f3d7c665700 10 osd.1 pg_epoch: 762 pg[0.f( v 762'339 (0'0,762'339] local-les=761 n=18 ec=1 les/c 761/731 761/761/707) [1,4] r=0 lpr=761 pi=708-760/3 mlcod 662'245 active+recovering snaptrimq=[145~1,16d~1,186~1,1d2~3]] put_object_context 0x2942600 955f9c7f/plana6421865-38/head//0 1 -> 0 2013-01-02 00:39:56.943405 7f3d7c665700 15 osd.1 pg_epoch: 762 pg[0.f( v 762'339 (0'0,762'339] local-les=761 n=18 ec=1 les/c 761/731 761/761/707) [1,4] r=0 lpr=761 pi=708-760/3 mlcod 662'245 active+recovering snaptrimq=[145~1,16d~1,186~1,1d2~3]] requeue_ops 2013-01-02 00:39:56.943426 7f3d7c665700 10 osd.1 pg_epoch: 762 pg[0.f( v 762'339 (0'0,762'339] local-les=761 n=18 ec=1 les/c 761/731 761/761/707) [1,4] r=0 lpr=761 pi=708-760/3 mlcod 662'245 active+recovering snaptrimq=[145~1,16d~1,186~1,1d2~3]] put_snapset_context plana6421865-38 1 -> 0 2013-01-02 00:39:56.943467 7f3d7c665700 10 osd.1 pg_epoch: 762 pg[0.f( v 762'339 (0'0,762'339] local-les=761 n=18 ec=1 les/c 761/731 761/761/707) [1,4] r=0 lpr=761 pi=708-760/3 mlcod 662'245 active+recovering snaptrimq=[145~1,16d~1,186~1,1d2~3]] eval_repop repgather(0x604f120 applied 762'339 rep_tid=1978 wfack= wfdisk= op=osd_op(client.4118.0:2702 plana6421865-38 [write 799931~481579] 0.955f9c7f snapc 1d6=[1d6,1d5,1d1,1d0,1cf,1cb,1c9,1c7,1c6,1c2,1bd,1bb,1b9,1b8,1b7,1ae,1ac,1a9,1a5,1a3,1a2,19f,192,185,17c,178,15f,129]) v4) wants=ad 2013-01-02 00:39:56.943516 7f3d7c665700 15 osd.1 pg_epoch: 762 pg[0.f( v 762'339 (0'0,762'339] local-les=761 n=18 ec=1 les/c 761/731 761/761/707) [1,4] r=0 lpr=761 pi=708-760/3 mlcod 662'245 active+recovering snaptrimq=[145~1,16d~1,186~1,1d2~3]] log_op_stats osd_op(client.4118.0:2702 plana6421865-38 [write 799931~481579] 0.955f9c7f snapc 1d6=[1d6,1d5,1d1,1d0,1cf,1cb,1c9,1c7,1c6,1c2,1bd,1bb,1b9,1b8,1b7,1ae,1ac,1a9,1a5,1a3,1a2,19f,192,185,17c,178,15f,129]) v4 inb 481703 outb 0 rlat 0.023600 lat 0.049485 2013-01-02 00:39:56.943564 7f3d7c665700 15 osd.1 pg_epoch: 762 pg[0.f( v 762'339 (0'0,762'339] local-les=761 n=18 ec=1 les/c 761/731 761/761/707) [1,4] r=0 lpr=761 pi=708-760/3 mlcod 662'245 active+recovering snaptrimq=[145~1,16d~1,186~1,1d2~3]] update_stats 707'116 2013-01-02 00:39:56.943590 7f3d7c665700 10 osd.1 pg_epoch: 762 pg[0.f( v 762'339 (0'0,762'339] local-les=761 n=18 ec=1 les/c 761/731 761/761/707) [1,4] r=0 lpr=761 pi=708-760/3 mlcod 662'245 active+recovering snaptrimq=[145~1,16d~1,186~1,1d2~3]] removing repgather(0x604f120 applied 762'339 rep_tid=1978 wfack= wfdisk= op=osd_op(client.4118.0:2702 plana6421865-38 [write 799931~481579] 0.955f9c7f snapc 1d6=[1d6,1d5,1d1,1d0,1cf,1cb,1c9,1c7,1c6,1c2,1bd,1bb,1b9,1b8,1b7,1ae,1ac,1a9,1a5,1a3,1a2,19f,192,185,17c,178,15f,129]) v4) 2013-01-02 00:39:56.943631 7f3d7c665700 20 osd.1 pg_epoch: 762 pg[0.f( v 762'339 (0'0,762'339] local-les=761 n=18 ec=1 les/c 761/731 761/761/707) [1,4] r=0 lpr=761 pi=708-760/3 mlcod 662'245 active+recovering snaptrimq=[145~1,16d~1,186~1,1d2~3]] q front is repgather(0x604f120 applied 762'339 rep_tid=1978 wfack= wfdisk= op=osd_op(client.4118.0:2702 plana6421865-38 [write 799931~481579] 0.955f9c7f snapc 1d6=[1d6,1d5,1d1,1d0,1cf,1cb,1c9,1c7,1c6,1c2,1bd,1bb,1b9,1b8,1b7,1ae,1ac,1a9,1a5,1a3,1a2,19f,192,185,17c,178,15f,129]) v4) 2013-01-02 00:39:56.967406 7f3d7c665700 -1 *** Caught signal (Segmentation fault) ** in thread 7f3d7c665700 ceph version 0.56-103-gf1196c7 (f1196c7e93af83405ea5082030a7899e256ded7a) 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x106865b] 2: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x10cdb46] 3: (()+0xfcb0) [0x7f3d86fffcb0] 4: (tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int)+0xeb) [0x7f3d860a8e3b] 5: (tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long)+0x23) [0x7f3d860a8ec3] 6: (operator delete(void*)+0x28b) [0x7f3d860b83ab] 7: (__gnu_cxx::new_allocator<pg_log_entry_t>::deallocate(pg_log_entry_t*, unsigned long)+0x20) [0xe72b4a] 8: (std::_Vector_base<pg_log_entry_t, std::allocator<pg_log_entry_t> >::_M_deallocate(pg_log_entry_t*, unsigned long)+0x32) [0xe5d8c2] 9: (std::_Vector_base<pg_log_entry_t, std::allocator<pg_log_entry_t> >::~_Vector_base()+0x56) [0xe4ae22] 10: (std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >::~vector()+0x42) [0xe3b724] 11: (ReplicatedPG::OpContext::~OpContext()+0x8f) [0xe32953] 12: (ReplicatedPG::RepGather::put()+0xbe) [0xe32e1a] 13: (ReplicatedPG::op_applied(ReplicatedPG::RepGather*)+0x964) [0xdfe740] 14: (C_OSD_OpApplied::finish(int)+0x2a) [0xe36de2] 15: (Finisher::finisher_thread_entry()+0x342) [0x10dd946] 16: (Finisher::FinisherThread::entry()+0x1c) [0xeda216] 17: (Thread::_entry_func(void*)+0x23) [0x11983b5] 18: (()+0x7e9a) [0x7f3d86ff7e9a] 19: (clone()+0x6d) [0x7f3d851824bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Sage Weil over 11 years ago
with glibc malloc and debug enabled:
ubuntu@teuthology:/a/sage-a2/33535$ zless remote/*/log/osd.4* ubuntu@teuthology:/a/sage-a2/33564$ zless remote/*/log/osd.1.* 2013-01-02 14:15:31.545161 7f82d84f5700 1 -- 10.214.132.34:6801/6641 >> :/0 pipe(0x7f8280002670 sd=45 :6801 pgs=0 cs=0 l=0).accept sd=45 10.214.132.35:48440/0 2013-01-02 14:15:31.545356 7f82cc7f0700 10 osd.1 1288 OSD::ms_get_authorizer type=osd 2013-01-02 14:15:31.545587 7f82d84f5700 10 osd.1 1288 new session 0x7f81e0006040 con=0x7f8280001070 addr=10.214.132.35:6807/4044 2013-01-02 14:15:31.545779 7f82d84f5700 10 osd.1 1288 session 0x7f81e0006040 osd.3 has caps osdcap[grant(*)] 'allow *' 2013-01-02 14:15:31.545803 7f82d84f5700 0 -- 10.214.132.34:6801/6641 >> 10.214.132.35:6807/4044 pipe(0x7f8280002670 sd=45 :6801 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 76 state connecting 2013-01-02 14:15:31.546054 7f82d37fe700 1 -- 10.214.132.34:6801/6641 <== osd.3 10.214.132.35:6807/4044 3371 ==== pg_info(1 pgs e1288:1.8) v3 ==== 516+0+0 (3272433763 0 0) 0x7f81e00048d0 con 0x7f82b800ce60 2013-01-02 14:15:31.546174 7f82d37fe700 7 osd.1 1288 handle_pg_info pg_info(1 pgs e1288:1.8) v3 from osd.3 2013-01-02 14:15:31.546212 7f82d37fe700 10 osd.1 pg_epoch: 1288 pg[1.8( empty local-les=1288 n=0 ec=1 les/c 1270/1270 1287/1287/1287) [1,3] r=0 lpr=1287 pi=1266-1286/2 mlcod 0'0 active] info 1.8( empty local-les=1288 n=0 ec=1 les/c 1288/1270 1287/1287/1287) from osd.3 2013-01-02 14:15:31.546418 7f82d84f5700 0 -- 10.214.132.34:6801/6641 >> 10.214.132.35:6807/4044 pipe(0x7f8280002670 sd=45 :6801 pgs=316 cs=77 l=0).reader got old message 3371 <= 3371 0x7f81e0004f70 pg_info(1 pgs e1288:1.8) v3, discarding 2013-01-02 14:15:31.548300 7f82d84f5700 0 -- 10.214.132.34:6801/6641 >> 10.214.132.35:6807/4044 pipe(0x7f8280002670 sd=45 :6801 pgs=316 cs=77 l=0).fault, initiating reconnect ... 2013-01-02 14:15:31.570776 7f82cf7f6700 10 osd.1 pg_epoch: 1288 pg[0.2( v 1268'248 lc 1237'243 (0'0,1268'248] local-les=1273 n=6 ec=1 les/c 1273/1234 1273/1273/1269) [1,3] r=0 lpr=1273 pi=1229-1272/6 lcod 1234'241 mlcod 0'0 active+recovering m=2 snaptrimq=[2d3~1,2d8~1,31c~1,320~1,32d~1,332~1,33f~3,343~1,346~4,34c~1,34e~1]] state<Started/Primary/Active>: Active: got notify from 5, already have info from that osd, ignoring2013-01-02 14:15:31.570764 7f82d84f5700 -1 *** Caught signal (Aborted) ** in thread 7f82d84f5700 ceph version 0.56 (1a32f0a0b42f169a7b55ed48ec3208f6d4edc1e8) 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x106924b] 2: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x10ce736] 3: (()+0xfcb0) [0x7f82f1c05cb0] 4: (gsignal()+0x35) [0x7f82eff39445] 5: (abort()+0x17b) [0x7f82eff3cbab] 6: (()+0x73e2e) [0x7f82eff76e2e] 7: (()+0x7e626) [0x7f82eff81626] 8: (__gnu_cxx::new_allocator<std::_List_node<ceph::buffer::ptr> >::deallocate(std::_List_node<ceph::buffer::ptr>*, unsigned long)+0x20) [0xdd2630] 9: (std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_put_node(std::_List_node<ceph::buffer::ptr>*)+0x28) [0xdd1ab4] 10: (std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_clear()+0x81) [0xdd1193] 11: (std::_List_base<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::~_List_base()+0x19) [0xdd0a2d] 12: (std::list<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::~list()+0x18) [0xdcfaa4] 13: (ceph::buffer::list::~list()+0x29) [0xdcfb53] 14: (Pipe::write_message(Message*)+0x10d8) [0x12905d8] 15: (Pipe::writer()+0xc7e) [0x128bc28] 16: (Pipe::Writer::entry()+0x1c) [0x1291372] 17: (Thread::_entry_func(void*)+0x23) [0x119ab89] 18: (()+0x7e9a) [0x7f82f1bfde9a] 19: (clone()+0x6d) [0x7f82efff54bd]
identical fault pattern in accept() and then backtrace for both of them. the intervening activity is also surprisingly similar, but not identical.
Updated by Sage Weil over 11 years ago
got msgr logs in ubuntu@teuthology:/a/sage-a3/34724, but the crash looked different from the earlier ones (whose logs have sadly been deleted). have not yet inspected the core.
Updated by Sage Weil over 11 years ago
- Status changed from 12 to Fix Under Review
YAY, wip-3678 is consistently passing now.
Updated by Sage Weil over 11 years ago
- Assignee changed from Sage Weil to Greg Farnum
Updated by Sage Weil over 11 years ago
Updated by Sage Weil over 11 years ago
- Status changed from Fix Under Review to Resolved
Actions