Project

General

Profile

Actions

Bug #10485

closed

unreadable ceph-osd core dump (firefly)

Added by Loïc Dachary over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/sage-2015-01-06_09:44:19-rados-wip-sage-testing-firefly---basic-multi/688290/
has a core dump for ceph-osd 5 but the log has no backtrace. An attempt to display the stack trace with gdb produces the following weird trace on vpm178 after an apt-get dist-upgrade

Core was generated by `ceph-osd -f -i 5'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fcbae6fef07 in _dl_map_object_deps (map=map@entry=0x7fcbae8ff4e8, preloads=preloads@entry=0x0, npreloads=npreloads@entry=0,
    trace_mode=trace_mode@entry=0, open_mode=open_mode@entry=-2147483648) at dl-deps.c:528
528     dl-deps.c: No such file or directory.
(gdb) bt
#0  0x00007fcbae6fef07 in _dl_map_object_deps (map=map@entry=0x7fcbae8ff4e8, preloads=preloads@entry=0x0, npreloads=npreloads@entry=0,
    trace_mode=trace_mode@entry=0, open_mode=open_mode@entry=-2147483648) at dl-deps.c:528
#1  0x00007fcbae705aab in dl_open_worker (a=a@entry=0x7fcb98740aa8) at dl-open.c:272
#2  0x00007fcbae700ff4 in _dl_catch_error (objname=objname@entry=0x7fcb98740a98, errstring=errstring@entry=0x7fcb98740aa0,
    mallocedp=mallocedp@entry=0x7fcb98740a90, operate=operate@entry=0x7fcbae7059a0 <dl_open_worker>, args=args@entry=0x7fcb98740aa8)
    at dl-error.c:187
#3  0x00007fcbae7053bb in _dl_open (file=0x7fcbac6174de "%d/%y", mode=-2147483647, caller_dlopen=<optimized out>, nsid=-2, argc=4,
    argv=0x7fffbe2e3c68, env=0x1f3c000) at dl-open.c:661
#4  0x00007fcbac5d1002 in elf_ifunc_invoke (addr=<optimized out>) at ../sysdeps/x86_64/dl-irel.h:32
#5  do_sym (handle=<optimized out>, name=<optimized out>, who=<optimized out>, vers=<optimized out>, flags=<optimized out>)
    at dl-sym.c:190
#6  0x00007fcbade26a90 in pthread_once () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:103
#7  0x00007fcbac5a5d7c in __GI___backtrace_symbols_fd (array=<optimized out>, size=<optimized out>, fd=0) at backtracesymsfd.c:72
#8  0x00007fcb98740ed0 in ?? ()
#9  0x0000000003f56ad8 in ?? ()
#10 0x00007fcb98740ee0 in ?? ()
#11 0x00000000048c9958 in ?? ()
#12 0x00007fcb98740f40 in ?? ()
#13 0x00000000036ce2d8 in ?? ()
#14 0x00000000008d3edb in ~basic_string (this=0x7fcb98740ea0, __in_chrg=<optimized out>) at /usr/include/c++/4.8/bits/basic_string.h:539
#15 LFNIndex::lfn_generate_object_name (this=<optimized out>, oid=...) at os/LFNIndex.cc:692
#16 0x000000000000001e in ?? ()
#17 0x00007fcb9874343c in ?? ()
#18 0x00007fcb98743300 in ?? ()
#19 0x0000000000000000 in ?? ()


and on vpm022, as is
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `ceph-osd -f -i 5'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fcbae6fef07 in _dl_map_object_deps (map=map@entry=0x7fcbae8ff4e8, preloads=preloads@entry=0x0, npreloads=npreloads@entry=0, trace_mode=trace_mode@entry=0, 
    open_mode=open_mode@entry=-2147483648) at dl-deps.c:528
528    dl-deps.c: No such file or directory.
(gdb) bt
#0  0x00007fcbae6fef07 in _dl_map_object_deps (map=map@entry=0x7fcbae8ff4e8, preloads=preloads@entry=0x0, npreloads=npreloads@entry=0, trace_mode=trace_mode@entry=0, 
    open_mode=open_mode@entry=-2147483648) at dl-deps.c:528
#1  0x00007fcbae705aab in dl_open_worker (a=a@entry=0x7fcb98740aa8) at dl-open.c:272
#2  0x00007fcbae700ff4 in _dl_catch_error (objname=objname@entry=0x7fcb98740a98, errstring=errstring@entry=0x7fcb98740aa0, mallocedp=mallocedp@entry=0x7fcb98740a90, 
    operate=operate@entry=0x7fcbae7059a0 <dl_open_worker>, args=args@entry=0x7fcb98740aa8) at dl-error.c:187
#3  0x00007fcbae7053bb in _dl_open (file=0x7fcbac6174de "em != ((void *)0)", mode=-2147483647, caller_dlopen=<optimized out>, nsid=-2, argc=4, argv=0x7fffbe2e3c68, env=0x1f3c000)
    at dl-open.c:661
#4  0x00007fcbac5d1002 in __GI___libc_dlopen_mode (name=0x7fcbae8ff4e8 "", mode=6642832) at dl-libc.c:157
#5  0x00007fcb98740cc0 in ?? ()
#6  0x00007fcbae9151c8 in _r_debug ()
#7  0x00007fcb98740ca0 in ?? ()
#8  0x00007fcb98740cb0 in ?? ()
#9  0x00007fcb98740c90 in ?? ()
#10 0x00007fcb98740b94 in ?? ()
#11 0x0000000000000000 in ?? ()


Related issues 2 (0 open2 closed)

Related to Ceph - Bug #9625: firefly: memory corruptionResolvedSage Weil09/29/2014

Actions
Has duplicate Ceph - Bug #11114: timeout expired in wait_for_all_upDuplicate03/16/2015

Actions
Actions #1

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions #2

Updated by Sage Weil over 9 years ago

  • Priority changed from Normal to High
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `ceph-osd -f -i 1'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fb4fb247f07 in _dl_map_object_deps (map=map@entry=0x7fb4fb4484e8, 
    preloads=preloads@entry=0x0, npreloads=npreloads@entry=0, 
    trace_mode=trace_mode@entry=0, open_mode=open_mode@entry=-2147483648)
    at dl-deps.c:528
528     dl-deps.c: No such file or directory.
(gdb) bt
#0  0x00007fb4fb247f07 in _dl_map_object_deps (map=map@entry=0x7fb4fb4484e8, 
    preloads=preloads@entry=0x0, npreloads=npreloads@entry=0, 
    trace_mode=trace_mode@entry=0, open_mode=open_mode@entry=-2147483648)
    at dl-deps.c:528
#1  0x00007fb4fb24eaab in dl_open_worker (a=a@entry=0x7fb4e528c168)
    at dl-open.c:272
#2  0x00007fb4fb249ff4 in _dl_catch_error (
    objname=objname@entry=0x7fb4e528c158, 
    errstring=errstring@entry=0x7fb4e528c160, 
    mallocedp=mallocedp@entry=0x7fb4e528c150, 
    operate=operate@entry=0x7fb4fb24e9a0 <dl_open_worker>, 
    args=args@entry=0x7fb4e528c168) at dl-error.c:187
#3  0x00007fb4fb24e3bb in _dl_open (file=0x7fb4f91604de "em != ((void *)0)", 
    mode=-2147483647, caller_dlopen=<optimized out>, nsid=-2, argc=4, 
    argv=0x7fffc3325e68, env=0x3eb8000) at dl-open.c:661
#4  0x00007fb4f911a002 in __GI___libc_dlopen_mode (name=0x7fb4fb4484e8 "", 
    mode=6642832) at dl-libc.c:157
#5  0x00007fb4e528c380 in ?? ()


this one also on firefly .. ubuntu@teuthology:/a/sage-2015-01-11_12:05:20-rados-firefly-distro-basic-multi/697528
Actions #3

Updated by Sage Weil over 9 years ago

???ubuntu@teuthology:/a/sage-2015-01-19_18:35:10-rados-wip-dho-distro-basic-multi/713884 (firefly)

Actions #4

Updated by Sage Weil about 9 years ago

  • Subject changed from unreadable ceph-osd core dump to unreadable ceph-osd core dump (firefly)
  • Source changed from other to Q/A
Actions #5

Updated by Loïc Dachary about 9 years ago

http://pulpito.ceph.com/loic-2015-02-27_20:22:09-rados-firefly-backports---basic-multi/781780/ *rados/thrash/{clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/osd-delay.yaml thrashers/pggrow.yaml workloads/ec-radosbench.yaml} *

#0  0x00007f59d6e9af07 in _dl_map_object_deps (map=map@entry=0x7f59d709c4e8, preloads=preloads@entry=0x0, 
    npreloads=npreloads@entry=0, trace_mode=trace_mode@entry=0, open_mode=open_mode@entry=-2147483648) at dl-deps.c:528
528    dl-deps.c: No such file or directory.
(gdb) bt
#0  0x00007f59d6e9af07 in _dl_map_object_deps (map=map@entry=0x7f59d709c4e8, preloads=preloads@entry=0x0, 
    npreloads=npreloads@entry=0, trace_mode=trace_mode@entry=0, open_mode=open_mode@entry=-2147483648) at dl-deps.c:528
#1  0x00007f59d6ea1aab in dl_open_worker (a=a@entry=0x7f59c06dc9a8) at dl-open.c:272
#2  0x00007f59d6e9cff4 in _dl_catch_error (objname=objname@entry=0x7f59c06dc998, errstring=errstring@entry=0x7f59c06dc9a0, 
    mallocedp=mallocedp@entry=0x7f59c06dc990, operate=operate@entry=0x7f59d6ea19a0 <dl_open_worker>, 
    args=args@entry=0x7f59c06dc9a8) at dl-error.c:187
#3  0x00007f59d6ea13bb in _dl_open (file=0x7f59d4db215e <rpc_errstr+30> " arguments", mode=-2147483647, 
    caller_dlopen=<optimized out>, nsid=-2, argc=4, argv=0x7fff6b6eda98, env=0x34e2000) at dl-open.c:661
#4  0x00007f59d4d6bc82 in determine_info (symbolp=0x0, mapp=0x4736e5, info=0xb, match=0x0, addr=140023457238000)
    at dl-addr.c:61
#5  __GI__dl_addr (address=0x7f59c06dcbf0, info=0xb, mapp=0x4736e5, symbolp=0x0) at dl-addr.c:137
#6  0x00007f59c06dcbc0 in ?? ()
#7  0x00007f59d70b11c8 in _r_debug ()
#8  0x00007f59c06dcba0 in ?? ()
#9  0x00007f59c06dcbb0 in ?? ()
#10 0x00007f59c06dcb90 in ?? ()
#11 0x00007f59c06dca94 in ?? ()
#12 0x0000000000000000 in ?? ()

Actions #6

Updated by Loïc Dachary about 9 years ago

Actions #7

Updated by Loïc Dachary about 9 years ago

  • Status changed from New to Resolved
  • Assignee set to Loïc Dachary

on vpm022

(gdb) bt
#0  0x00007f5b4e76e20b in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1  0x000000000098302a in reraise_fatal (signum=6) at global/signal_handler.cc:59
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:105
#3  <signal handler called>
#4  0x00007f5b4ce15cc9 in sigisemptyset (set=0x9c8) at sigisempty.c:34
#5  0x00007f5b4ce190d8 in _quicksort (pbase=0x0, total_elems=<optimized out>, size=0, cmp=0xbb0720 <typeinfo name for ceph::FailedAssertion>, 
    arg=0x3b0b6a0) at qsort.c:123
#6  0x0000000005503618 in ?? ()
#7  0x0000000005503bf3 in ?? ()
#8  0x0000000005503e18 in ?? ()
#9  0x00007f5b4d9c2e00 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x0000000000000010 in ?? ()
#11 0x0000000005503618 in ?? ()
#12 0x00007f5b4d9aa2c0 in vtable for std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> > ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x0000000000000006 in ?? ()
#14 0x0000000000000000 in ?? ()

after apt-get dist-upgrade
(gdb) 
#0  0x00007f5b4e76e20b in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1  0x000000000098302a in reraise_fatal (signum=6) at global/signal_handler.cc:59
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:105
#3  <signal handler called>
#4  0x00007f5b4ce15cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#5  0x00007f5b4ce190d8 in __GI_abort () at abort.c:89
#6  0x00007f5b4d7206b5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f5b4d71e836 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007f5b4d71e863 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007f5b4d71eaa2 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x0000000000a66802 in ceph::__ceph_assert_fail (assertion=assertion@entry=0xb82a98 "soid < scrubber.start || soid >= scrubber.end", 
    file=file@entry=0xb7e3a0 "osd/ReplicatedPG.cc", line=line@entry=5320, 
    func=func@entry=0xb8a180 <ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)::__PRETTY_FUNCTION__> "void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)") at common/assert.cc:77
#11 0x00000000007d3ae6 in ReplicatedPG::finish_ctx (this=this@entry=0x32ba400, ctx=ctx@entry=0x46bea00, log_op_type=log_op_type@entry=8, 
    maintain_ssc=maintain_ssc@entry=true) at osd/ReplicatedPG.cc:5320
#12 0x00000000007d7c90 in ReplicatedPG::finish_promote (this=0x32ba400, r=<optimized out>, op=..., results=<optimized out>, obc=...)
    at osd/ReplicatedPG.cc:6025
#13 0x0000000000841e9c in PromoteCallback::finish (this=<optimized out>, results=...) at osd/ReplicatedPG.cc:1687
#14 0x0000000000816ca9 in GenContext<boost::tuples::tuple<int, ReplicatedPG::CopyResults*, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type> >::complete (this=0x49b2420, t=...) at ./include/Context.h:45
#15 0x00000000007ec55d in ReplicatedPG::process_copy_chunk (this=0x32ba400, oid=..., tid=tid@entry=926, r=r@entry=0)
    at osd/ReplicatedPG.cc:5711
#16 0x0000000000842b49 in C_Copyfrom::finish (this=0x4b87c60, r=0) at osd/ReplicatedPG.cc:5371
#17 0x0000000000655a19 in Context::complete (this=0x4b87c60, r=<optimized out>) at ./include/Context.h:64
#18 0x00000000009a60d8 in Finisher::finisher_thread_entry (this=0x2ffc340) at common/Finisher.cc:56
#19 0x00007f5b4e766182 in start_thread (arg=0x7f5b35885700) at pthread_create.c:312
#20 0x00007f5b4ced947d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

simple enough ;-)

Actions

Also available in: Atom PDF