Project

General

Profile

Bug #18576

Enabling LTTnG causes Segmentation fault in libgcc

Added by Ganesh Mahalingam 8 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
01/17/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
master
Needs Doc:
No

Description

Enabling LTTnG on master causes a segmentation fault in libgcc.

Error log: #############################################################################################################
vagrant@cephaio:/data/odisk/ceph/ceph/build$ OSD=1 MON=1 MDS=0 CEPH_DEV_DIR=/data/ceph-disk/ /data/odisk/ceph/ceph/src/vstart.sh -d -n -x --valgrind memcheck -k
  • going verbose **
    2017-01-17 18:27:17.622767 7f40ad7deec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:17.683761 7f73e1adaec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:17.745503 7f2f4fcedec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:17.813466 7fb1567aaec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:17.876913 7f8f58e90ec0 -1 WARNING: all dangerous and experimental features are enabled. === osd.0 ===
    2017-01-17 18:27:17.935373 7fb061009ec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:17.998658 7fc2d31a8ec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:18.058194 7fc72db5fec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:18.114895 7f3defaffec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:18.176292 7f37b68faec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:18.234354 7f4b408d2ec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:18.293521 7f70b31caec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:18.350223 7f5e5df50ec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:18.405729 7f4dde8d9ec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:18.466701 7fc87df0eec0 -1 WARNING: all dangerous and experimental features are enabled.
    Stopping Ceph osd.0 on cephaio...done
    2017-01-17 18:27:18.542231 7fa32aedbec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:18.607174 7f0402c1aec0 -1 WARNING: all dangerous and experimental features are enabled. === mon.a ===
    2017-01-17 18:27:18.668639 7f02b4f3eec0 1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:18.726736 7f7c61f97ec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:18.786517 7f46f097eec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:18.841316 7fa9eb4f2ec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:18.905849 7f9665a3fec0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:19.002536 7f4239722ec0 -1 WARNING: all dangerous and experimental features are enabled.
    Stopping Ceph mon.a on cephaio...done
    rm -f core*
    hostname cephaio
    ip 10.0.2.15
    port 40000
    /data/odisk/ceph/ceph/build/bin/ceph-authtool --create-keyring --gen-key --name=mon. /data/odisk/ceph/ceph/build/keyring --cap mon allow
    creating /data/odisk/ceph/ceph/build/keyring
    /data/odisk/ceph/ceph/build/bin/ceph-authtool --gen-key --name=client.admin --set-uid=0 --cap mon allow * --cap osd allow * --cap mds allow * /data/odisk/ceph/ceph/build/keyring
    /data/odisk/ceph/ceph/build/bin/monmaptool --create --clobber --add a 10.0.2.15:40000 --print /tmp/ceph_monmap.31953
    /data/odisk/ceph/ceph/build/bin/monmaptool: monmap file /tmp/ceph_monmap.31953
    /data/odisk/ceph/ceph/build/bin/monmaptool: generated fsid a3cd0dc7-aa72-4709-a554-2167e79691f6
    epoch 0
    fsid a3cd0dc7-aa72-4709-a554-2167e79691f6
    last_changed 2017-01-17 18:27:19.283438
    created 2017-01-17 18:27:19.283438
    0: 10.0.2.15:40000/0 mon.a
    /data/odisk/ceph/ceph/build/bin/monmaptool: writing epoch 0 to /tmp/ceph_monmap.31953 (1 monitors)
    rm -rf -
    /data/ceph-disk//mon.a
    mkdir p /data/ceph-disk//mon.a
    /data/odisk/ceph/ceph/build/bin/ceph-mon --mkfs -c /data/odisk/ceph/ceph/build/ceph.conf -i a --monmap=/tmp/ceph_monmap.31953 --keyring=/data/odisk/ceph/ceph/build/keyring
    2017-01-17 18:27:19.307163 7fc8448e3ac0 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:19.307942 7fc8448e3ac0 -1 WARNING: all dangerous and experimental features are enabled.
    /data/odisk/ceph/ceph/build/bin/ceph-mon: set fsid to f900bd15-5767-4658-9ef8-753e80219928
    /data/odisk/ceph/ceph/build/bin/ceph-mon: created monfs at /data/ceph-disk//mon.a for mon.a
    rm -
    /tmp/ceph_monmap.31953
    valgrind --tool=memcheck /data/odisk/ceph/ceph/build/bin/ceph-mon -i a -c /data/odisk/ceph/ceph/build/ceph.conf -f &
    32117 Memcheck, a memory error detector
    32117 Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
    32117 Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
    32117 Command: /data/odisk/ceph/ceph/build/bin/ceph-mon -i a -c /data/odisk/ceph/ceph/build/ceph.conf -f
    32117
    ERROR: error accessing '/data/ceph-disk//osd0/
    '
    add osd0 54170e28-15a9-4683-972e-95a10e2c44b1
    /data/odisk/ceph/ceph/build/bin/ceph -c /data/odisk/ceph/ceph/build/ceph.conf -k /data/odisk/ceph/ceph/build/keyring osd create 54170e28-15a9-4683-972e-95a10e2c44b1
    2017-01-17 18:27:20.516830 7f6fb5a03700 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:20.588141 7f6fb5a03700 -1 WARNING: all dangerous and experimental features are enabled.
    32117 Syscall param msync(start) points to uninitialised byte(s)
    32117 at 0xDFC6B20: __msync_nocancel (syscall-template.S:81)
    32117 by 0xFFCC123: ? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
    32117 by 0xFFCEEF6: ?
    (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
    32117 by 0xFFD0151: ? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
    32117 by 0xFFD04E8: ?
    (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
    32117 by 0xFFCCA30: _ULx86_64_step (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
    32117 by 0xDD7AEF2: GetStackTrace(void**, int, int) (in /usr/lib/libtcmalloc.so.4.1.2)
    32117 by 0xDD6D854: tcmalloc::PageHeap::GrowHeap(unsigned long) (in /usr/lib/libtcmalloc.so.4.1.2)
    32117 by 0xDD6DB62: tcmalloc::PageHeap::New(unsigned long) (in /usr/lib/libtcmalloc.so.4.1.2)
    32117 by 0xDD6C6D6: tcmalloc::CentralFreeList::Populate() (in /usr/lib/libtcmalloc.so.4.1.2)
    32117 by 0xDD6C8A7: tcmalloc::CentralFreeList::FetchFromSpansSafe() (in /usr/lib/libtcmalloc.so.4.1.2)
    32117 by 0xDD6C922: tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) (in /usr/lib/libtcmalloc.so.4.1.2)
    32117 Address 0xfff000010 is on thread 1's stack
    32117 in frame #7, created by tcmalloc::PageHeap::GrowHeap(unsigned long) (?)
    32117
    2017-01-17 18:27:23.844957 403d580 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:23.979365 403d580 -1 WARNING: all dangerous and experimental features are enabled.
    2017-01-17 18:27:24.521748 403d580 -1 WARNING: all dangerous and experimental features are enabled.
    32117 Syscall param msync(start) points to uninitialised byte(s)
    32117 at 0xDFC6B3D: ?
    (syscall-template.S:81)
    32117 by 0xFFCC123: ? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
    32117 by 0xFFCEEF6: ?
    (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
    32117 by 0xFFD0151: ? (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
    32117 by 0xFFD04E8: ?
    (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
    32117 by 0xFFCCA30: _ULx86_64_step (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
    32117 by 0xDD7AEF2: GetStackTrace(void**, int, int) (in /usr/lib/libtcmalloc.so.4.1.2)
    32117 by 0xDD6D854: tcmalloc::PageHeap::GrowHeap(unsigned long) (in /usr/lib/libtcmalloc.so.4.1.2)
    32117 by 0xDD6DB62: tcmalloc::PageHeap::New(unsigned long) (in /usr/lib/libtcmalloc.so.4.1.2)
    32117 by 0xDD6C6D6: tcmalloc::CentralFreeList::Populate() (in /usr/lib/libtcmalloc.so.4.1.2)
    32117 by 0xDD6C8A7: tcmalloc::CentralFreeList::FetchFromSpansSafe() (in /usr/lib/libtcmalloc.so.4.1.2)
    32117 by 0xDD6C922: tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) (in /usr/lib/libtcmalloc.so.4.1.2)
    32117 Address 0xffeffe030 is on thread 1's stack
    32117 in frame #7, created by tcmalloc::PageHeap::GrowHeap(unsigned long) (???)
    32117
    starting mon.a rank 0 at 10.0.2.15:40000/0 mon_data /data/ceph-disk//mon.a fsid f900bd15-5767-4658-9ef8-753e80219928
    0
    Segmentation fault (core dumped) #############################################################################################################

Steps to reproduce:
1. Build master branch with -DWITH_LTTNG=ON -DHAVE_BABELTRACE=ON
2. Ubuntu 14.04 Vagrant VM.
3. Enable LTTnG logging in ceph.conf with the below config.
osd_tracing = true
osd_objectstore_tracing = true
rados_tracing = true
rbd_tracing = true
4. The above vstart command will cause a segmentation fault.
[323451.320169] ceph32582: segfault at 7f0b348250ed ip 00007f0b38666668 sp 00007f0b2fffd860 error 4 in libgcc_s.so.1[7f0b38657000+16000]


Related issues

Related to RADOS - Bug #18696: OSD might assert when LTTNG tracing is enabled New 01/27/2017

History

#1 Updated by Ganesh Mahalingam 8 months ago

The segfault only happens when enabling rados_tracing. Enabling all the others does not trigger the segfault.

#2 Updated by Jesse Williamson 8 months ago

I'm looking into this, but in the meantime I had a thought, if you'll forgive a little speculation. The short version is that it may be worthwhile to see if you can
upgrade tcmalloc and libunwind-- it looks like the versions being used in this report at least have some known bugs that aren't identical to what you're seeing but suspiciously similar...

Details:

In our bug report, just before the stack trace is gathered, we have:
32117 by 0xDD6D854: tcmalloc::PageHeap::GrowHeap(unsigned long) (in /usr/lib/libtcmalloc.so.4.1.2)
32117 by 0xDD6DB62: tcmalloc::PageHeap::New(unsigned long) (in /usr/lib/libtcmalloc.so.4.1.2)
32117 by 0xDD6C6D6: tcmalloc::CentralFreeList::Populate() (in /usr/lib/libtcmalloc.so.4.1.2)
32117 by 0xDD6C8A7: tcmalloc::CentralFreeList::FetchFromSpansSafe() (in /usr/lib/libtcmalloc.so.4.1.2)
32117 by 0xDD6C922: tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) (in /usr/lib/libtcmalloc.so.4.1.2)

...notice here there is a not-identical, but similar pattern (source: "https://github.com/gperftools/gperftools/issues/585"), especially the call FetchFromSpans() (vis-a-vis our
version's FetchFromSpansSafe()) after a call to RemoveRange():

#0 tcmalloc::CentralFreeList::FetchFromSpans (this=0x7f9898f1caa0)
at src/central_freelist.cc:298
#1 0x00007f9898cf1078 in tcmalloc::CentralFreeList::RemoveRange (
this=0x7f9898f1caa0, start=0x7f982d066480, end=0x7f982d066488, N=166)
at src/central_freelist.cc:269

...not identical, but suspiciously similar.

In that report, it sounds like the application code was patched to correct the bug.

This report blames libunwind:
https://github.com/gperftools/gperftools/issues/101

...which indeed appears further down in our stack trace. A little more digging found this interesting report from Seastar (a cool project that the async I/O core for ScyllaDB, BTW):
https://groups.google.com/forum/#!msg/seastar-dev/SiL4uUvvJAc/tsEsI87dAAAJ

...the tidbit of interest is:
"This manifests in a number of ways. One is that some operations appear
to hang due to worker thread completion notifications not being
delivered. Collectd statistics are not exported due to timers not
firing. It may also cause aborts on SIGTERM if it's delivered to
non-0th shard."

Other note:
I did find a Ceph report, but it has to do with performance rather than a segfault. For completeness: https://patchwork.kernel.org/patch/6107841/

#3 Updated by Ganesh Mahalingam 8 months ago

Thanks for looking into the bug. I updated both tcmalloc and libunwind from yakkety sources and hit the same error, but with a small variation. Wonder if the packages need to be compiled for Ubuntu 14.04

Current packages:
vagrant@cephaio:/data/odisk/ceph/ceph/build/out$ sudo dpkg -l | grep tcmalloc
ii libtcmalloc-minimal4 2.4-0ubuntu5 amd64 efficient thread-caching malloc
vagrant@cephaio:/data/odisk/ceph/ceph/build/out$ sudo dpkg -l | grep google-perf
ii libgoogle-perftools-dev 2.4-0ubuntu5 amd64 libraries for CPU and heap analysis, plus an efficient thread-caching malloc
ii libgoogle-perftools4 2.4-0ubuntu5 amd64 libraries for CPU and heap analysis, plus an efficient thread-caching malloc
ii libgoogle-perftools4-dbg 2.4-0ubuntu5 amd64 libraries for CPU and heap analysis, plus an efficient thread-caching malloc
vagrant@cephaio:/data/odisk/ceph/ceph/build/out$ sudo dpkg -l | grep libgcc1
ii libgcc1:amd64 1:6.2.0-5ubuntu12 amd64 GCC support library
vagrant@cephaio:/data/odisk/ceph/ceph/build/out$ sudo dpkg -l | grep libunwind8
ii libunwind8 1.1-4.1ubuntu2 amd64 library to determine the call-chain of a program - runtime
ii libunwind8-dbg 1.1-4.1ubuntu2 amd64 library to determine the call-chain of a program - runtime
ii libunwind8-dev 1.1-4.1ubuntu2 amd64 library to determine the call-chain of a program - development

Logs:
2017-01-23 21:15:31.097217 7f526c2bc700 0 -- - >> 10.0.2.15:40000/0 pipe(0x7f5270094160 sd=9 :0 s=1 pgs=0 cs=0 l=1 c=0x7f527009c160).fault
--19333-- WARNING: Serious error when reading debug info
--19333-- When reading debug info from /usr/lib/libtcmalloc.so.4.2.6:
--19333-- Ignoring non-Dwarf2/3/4 block in .debug_info
--19333-- WARNING: Serious error when reading debug info
--19333-- When reading debug info from /usr/lib/libtcmalloc.so.4.2.6:
--19333-- Last block truncated in .debug_info; ignoring
--19333-- WARNING: Serious error when reading debug info
--19333-- When reading debug info from /usr/lib/libtcmalloc.so.4.2.6:
--19333-- parse_CU_Header: is neither DWARF2 nor DWARF3 nor DWARF4
--19333-- WARNING: Serious error when reading debug info
--19333-- When reading debug info from /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1:
--19333-- Ignoring non-Dwarf2/3/4 block in .debug_info
--19333-- WARNING: Serious error when reading debug info
--19333-- When reading debug info from /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1:
--19333-- Last block truncated in .debug_info; ignoring
--19333-- WARNING: Serious error when reading debug info
--19333-- When reading debug info from /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1:
--19333-- parse_CU_Header: is neither DWARF2 nor DWARF3 nor DWARF4
19333 Syscall param msync(start) points to uninitialised byte(s)
19333 at 0xDFCCB20: __msync_nocancel (syscall-template.S:81)
19333 by 0x10057266: access_mem (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0x1005AD03: apply_reg_state (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0x1005B24E: _ULx86_64_dwarf_find_save_locs (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0x1005B5C8: _ULx86_64_dwarf_step (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0x10057C70: _ULx86_64_step (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0xDD7C4DA: GetStackTrace_libunwind(void**, int, int) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD7CCED: GetStackTrace(void**, int, int) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD6E0BF: tcmalloc::PageHeap::GrowHeap(unsigned long) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD6E422: tcmalloc::PageHeap::New(unsigned long) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD6CD33: tcmalloc::CentralFreeList::Populate() (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD6CF27: tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 Address 0xfff000020 is on thread 1's stack
19333 in frame #7, created by GetStackTrace(void**, int, int) (?)
19333
2017-01-23 21:15:34.096121 7f526c1bb700 0 -- - >> 10.0.2.15:40000/0 pipe(0x7f5244000d90 sd=9 :0 s=1 pgs=0 cs=0 l=1 c=0x7f5244002090).fault
2017-01-23 21:15:34.120741 403e640 -1 WARNING: all dangerous and experimental features are enabled.
2017-01-23 21:15:34.242587 403e640 -1 WARNING: all dangerous and experimental features are enabled.
19333 Syscall param msync(start) points to uninitialised byte(s)
19333 at 0xDFCCB3D: ?
(syscall-template.S:81)
19333 by 0x10057266: access_mem (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0x1005AD03: apply_reg_state (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0x1005B24E: _ULx86_64_dwarf_find_save_locs (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0x1005B5C8: _ULx86_64_dwarf_step (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0x10057C70: _ULx86_64_step (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0xDD7C4DA: GetStackTrace_libunwind(void**, int, int) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD7CCED: GetStackTrace(void**, int, int) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD6E0BF: tcmalloc::PageHeap::GrowHeap(unsigned long) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD6E422: tcmalloc::PageHeap::New(unsigned long) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD6CD33: tcmalloc::CentralFreeList::Populate() (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD6CF27: tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 Address 0xffeffe000 is on thread 1's stack
19333 in frame #6, created by GetStackTrace_libunwind(void**, int, int) (?)
19333
2017-01-23 21:15:34.869583 403e640 -1 WARNING: all dangerous and experimental features are enabled.
starting mon.a rank 0 at 10.0.2.15:40000/0 mon_data /data/ceph-disk//mon.a fsid a1433fb9-974b-4ca8-8081-d21b24103b30
19333 Thread 8 ms_dispatch:
19333 Syscall param msync(start) points to unaddressable byte(s)
19333 at 0xDFCCB3D: ?
(syscall-template.S:81)
19333 by 0x10057266: access_mem (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0x1005AD03: apply_reg_state (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0x1005B24E: _ULx86_64_dwarf_find_save_locs (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0x1005B5C8: _ULx86_64_dwarf_step (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0x10057C70: _ULx86_64_step (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
19333 by 0xDD7C4DA: GetStackTrace_libunwind(void**, int, int) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD7CCED: GetStackTrace(void**, int, int) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD6E0BF: tcmalloc::PageHeap::GrowHeap(unsigned long) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD6E422: tcmalloc::PageHeap::New(unsigned long) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD6CD33: tcmalloc::CentralFreeList::Populate() (in /usr/lib/libtcmalloc.so.4.2.6)
19333 by 0xDD6CF27: tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) (in /usr/lib/libtcmalloc.so.4.2.6)
19333 Address 0x15df2088 is on thread 8's stack
19333 136 bytes below stack pointer
19333
0
Segmentation fault (core dumped)

#4 Updated by Ganesh Mahalingam 8 months ago

I added -x to vstart and I am getting the below as logs.

2017-01-23 23:41:41.336351 403e640 -1 WARNING: all dangerous and experimental features are enabled.
starting mon.a rank 0 at 10.0.2.15:40000/0 mon_data /data/ceph-disk//mon.a fsid a1433fb9-974b-4ca8-8081-d21b24103b30
21867 Thread 8 ms_dispatch:
21867 Syscall param msync(start) points to unaddressable byte(s)
21867 at 0xDFCCB3D: ??? (syscall-template.S:81)
21867 by 0x10057266: access_mem (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
21867 by 0x1005AD03: apply_reg_state (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
21867 by 0x1005B24E: _ULx86_64_dwarf_find_save_locs (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
21867 by 0x1005B5C8: _ULx86_64_dwarf_step (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
21867 by 0x10057C70: _ULx86_64_step (in /usr/lib/x86_64-linux-gnu/libunwind.so.8.0.1)
21867 by 0xDD7C4DA: GetStackTrace_libunwind(void**, int, int) (in /usr/lib/libtcmalloc.so.4.2.6)
21867 by 0xDD7CCED: GetStackTrace(void**, int, int) (in /usr/lib/libtcmalloc.so.4.2.6)
21867 by 0xDD6E0BF: tcmalloc::PageHeap::GrowHeap(unsigned long) (in /usr/lib/libtcmalloc.so.4.2.6)
21867 by 0xDD6E422: tcmalloc::PageHeap::New(unsigned long) (in /usr/lib/libtcmalloc.so.4.2.6)
21867 by 0xDD6CD33: tcmalloc::CentralFreeList::Populate() (in /usr/lib/libtcmalloc.so.4.2.6)
21867 by 0xDD6CF27: tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) (in /usr/lib/libtcmalloc.so.4.2.6)
21867 Address 0x15df2088 is on thread 8's stack
21867 136 bytes below stack pointer
21867
0
+ ceph_adm osd crush add osd.0 1.0 host=cephaio root=default
+ [ 1 -eq 1 ]
+ prun /data/odisk/ceph/ceph/build/bin/ceph -c /data/odisk/ceph/ceph/build/ceph.conf -k /data/odisk/ceph/ceph/build/keyring osd crush add osd.0 1.0 host=cephaio root=default
+ echo /data/odisk/ceph/ceph/build/bin/ceph -c /data/odisk/ceph/ceph/build/ceph.conf -k /data/odisk/ceph/ceph/build/keyring osd crush add osd.0 1.0 host=cephaio root=default
/data/odisk/ceph/ceph/build/bin/ceph -c /data/odisk/ceph/ceph/build/ceph.conf -k /data/odisk/ceph/ceph/build/keyring osd crush add osd.0 1.0 host=cephaio root=default
+ /data/odisk/ceph/ceph/build/bin/ceph -c /data/odisk/ceph/ceph/build/ceph.conf -k /data/odisk/ceph/ceph/build/keyring osd crush add osd.0 1.0 host=cephaio root=default
2017-01-23 23:41:45.007479 7fe752d4f700 -1 WARNING: all dangerous and experimental features are enabled.
2017-01-23 23:41:45.072005 7fe752d4f700 -1 WARNING: all dangerous and experimental features are enabled.
add item id 0 name 'osd.0' weight 1 at location {host=cephaio,root=default} to crush map
Segmentation fault (core dumped)

Wonder if there is something else causing the segfault.

#5 Updated by Ganesh Mahalingam 8 months ago

Sorry for the barrage of updates. Adding more logs on what i found when i ran 'ceph osd ls' via gdb.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe8b43700 (LWP 23446)]
uw_frame_state_for (context=context@entry=0x7fffe8b41ba0, fs=fs@entry=0x7fffe8b418f0)
at ../../../src/libgcc/unwind-dw2.c:1249
1249 ../../../src/libgcc/unwind-dw2.c: No such file or directory.
(gdb) bt
#0 uw_frame_state_for (context=context@entry=0x7fffe8b41ba0, fs=fs@entry=0x7fffe8b418f0)
at ../../../src/libgcc/unwind-dw2.c:1249
#1 0x00007fffecb9c03b in _Unwind_ForcedUnwind_Phase2 (exc=exc@entry=0x7fffe8b43d70,
context=context@entry=0x7fffe8b41ba0) at ../../../src/libgcc/unwind.inc:155
#2 0x00007fffecb9c374 in _Unwind_ForcedUnwind (exc=0x7fffe8b43d70,
stop=stop@entry=0x7ffff7bcabb0 <unwind_stop>, stop_argument=<optimized out>)
at ../../../src/libgcc/unwind.inc:207
#3 0x00007ffff7bcad30 in GI_pthread_unwind (buf=<optimized out>) at unwind.c:129
#4 0x00007ffff7bc2a75 in __do_cancel () at ../nptl/pthreadP.h:280
#5 sigcancel_handler (sig=<optimized out>, si=<optimized out>, ctx=<optimized out>)
at nptl-init.c:214
#6 <signal handler called>
#7 0x00007ffff7bcb8ad in recvmsg () at ../sysdeps/unix/syscall-template.S:81
#8 0x00007fffe8d5a0ed in ?? ()
#9 0x752d676e74746c2f in ?? ()
#10 0x2d6b636f732d7473 in ?? ()
#11 0x00007fffe8b42940 in ?? ()
#12 0x0000000000000264 in ?? ()
#13 0x0000000000000000 in ?? ()
(gdb) info threads
Id Target Id Frame
23 Thread 0x7fffea27b700 (LWP 23467) "python" _dl_close_worker (map=map@entry=0x7fffe403bac0)
at dl-close.c:747
9 Thread 0x7fffe2ac1700 (LWP 23451) "admin_socket" 0x00007ffff78e3fdd in poll ()
at ../sysdeps/unix/syscall-template.S:81
5 Thread 0x7fffe3fff700 (LWP 23447) "python" syscall ()
at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  • 4 Thread 0x7fffe8b43700 (LWP 23446) "python" uw_frame_state_for (
    context=context@entry=0x7fffe8b41ba0, fs=fs@entry=0x7fffe8b418f0)
    at ../../../src/libgcc/unwind-dw2.c:1249
    3 Thread 0x7fffe9a7a700 (LWP 23445) "log" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
    1 Thread 0x7ffff7fe6740 (LWP 23434) "python" 0x00007ffff78e8c53 in select ()
    at ../sysdeps/unix/syscall-template.S:81
    (gdb)

#6 Updated by Nathan Cutler 8 months ago

  • Related to Bug #18696: OSD might assert when LTTNG tracing is enabled added

#7 Updated by Sage Weil 6 months ago

  • Status changed from New to Resolved

Also available in: Atom PDF