Project

General

Profile

Actions

Bug #3595

closed

ceph-osd and ceph-mds crash on Debian Squeeze

Added by Jörg Blank over 11 years ago. Updated about 11 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Using the official packages for 0.48 and 0.55 on Debian Squeeze always leads to a crash of the ceph-osd and ceph-mds daemons during basic setup.
This bug is described at https://code.google.com/p/gperftools/issues/detail?id=304 and was fixed with google-perftools 1.7.
As Debian Squeeze currently ships with version 1.5, perftools should be disabled on this platform.

Actions #1

Updated by Sage Weil about 11 years ago

  • Status changed from New to Need More Info

Is this still a problem with the bobtail packages?

Actions #2

Updated by Jörg Blank about 11 years ago

root@cluster:~# ceph-osd
Segmentation fault
root@cluster:~# ceph-osd -h
Segmentation fault
root@cluster:~# ceph-osd
Segmentation fault

Unfortunately the bug is still present.

Actions #3

Updated by Greg Farnum about 11 years ago

I actually run these on Debian pretty often and don't have any issues, so I'm a bit confused. Can you grab a backtrace out of GDB for me?

Actions #4

Updated by Jörg Blank about 11 years ago

dpkg -s ceph

Version: 0.56.2-1~bpo60+1

dpkg -s libgoogle-perftools0

Version: 1.5-1

dpkg -s libtcmalloc-minimal0

Version: 1.5-1

gdb ceph-osd

(gdb) run
Starting program: /usr/bin/ceph-osd
[Thread debugging using libthread_db enabled]

Program received signal SIGSEGV, Segmentation fault.
base::VDSOSupport::ElfMemImage::GetNumSymbols (this=0x7fffffffeb40) at src/base/vdso_support.cc:139
139 src/base/vdso_support.cc: No such file or directory.
in src/base/vdso_support.cc
(gdb) bt
#0 base::VDSOSupport::ElfMemImage::GetNumSymbols (this=0x7fffffffeb40) at src/base/vdso_support.cc:139
#1 0x00007ffff6c37962 in base::VDSOSupport::SymbolIterator::Update (this=0x7fffffffead0, increment=0) at src/base/vdso_support.cc:496
#2 0x00007ffff6c37af5 in base::VDSOSupport::begin (this=<value optimized out>) at src/base/vdso_support.cc:481
#3 0x00007ffff6c381f4 in base::VDSOSupport::LookupSymbol (this=0x7fffffffeb40, name=0x7ffff6c40ca3 "__vdso_getcpu",
version=0xffffffffff700120 <Address 0xffffffffff700120 out of bounds>, type=2, info=0x7fffffffeb90) at src/base/vdso_support.cc:416
#4 0x00007ffff6c38313 in base::VDSOSupport::Init () at src/base/vdso_support.cc:390
#5 0x00007ffff6c39cb6 in __do_global_ctors_aux () from /usr/lib/libtcmalloc.so.0
#6 0x00007ffff6c1d013 in _init () from /usr/lib/libtcmalloc.so.0
#7 0x00007fffffffecb8 in ?? ()
#8 0x00007ffff7decc69 in call_init (l=0x7ffff7ff0980, argc=-152801168, argv=0x7fffffffeca8, env=0x7fffffffecb8) at dl-init.c:70
#9 0x00007ffff7decda7 in _dl_init (main_map=0x7ffff7ffe128, argc=1, argv=0x7fffffffeca8, env=0x7fffffffecb8) at dl-init.c:134
#10 0x00007ffff7ddfb2a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#11 0x0000000000000001 in ?? ()
#12 0x00007fffffffee9f in ?? ()
#13 0x0000000000000000 in ?? ()

Actions #5

Updated by Greg Farnum about 11 years ago

How bizarre:

gregf@kai:~$ dpkg -s ceph | grep Version:
Version: 0.56.2-1~bpo60+1
gregf@kai:~$ dpkg -s libgoogle-perftools0 | grep Version:
Version: 1.5-1
gregf@kai:~$ dpkg -s libtcmalloc-minimal0 | grep Version:
Version: 1.5-1
gregf@kai:~$ lsb_release -d
Description:    Debian GNU/Linux 6.0.6 (n/a)
gregf@kai:~$ gdb ceph-osd
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying" 
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/ceph-osd...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/bin/ceph-osd 
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff56ac710 (LWP 4102)]
  --conf/-c        Read configuration from the given configuration file
  -d               Run in foreground, log to stderr.
  -f               Run in foreground, log to usual location.
  --id/-i          set ID portion of my name
  --name/-n        set name (TYPE.ID)
  --version        show version and quit

  --debug_ms N
        set message debug level (e.g. 1)
2013-02-12 12:33:53.829837 7ffff7fe4780 -1 must specify '-i #' where # is the osd number
2013-02-12 12:33:53.829838 7ffff7fe4780 -1 usage: ceph-osd -i osdid [--osd-data=path] [--osd-journal=path] [--mkfs] [--mkjournal] [--convert-filestore]
2013-02-12 12:33:53.829839 7ffff7fe4780 -1    --debug_osd N   set debug level (e.g. 10)
[Thread 0x7ffff56ac710 (LWP 4102) exited]

Program exited with code 01.
(gdb) 

and
gregf@kai:~$ ldd /usr/bin/ceph-osd 
    linux-vdso.so.1 =>  (0x00007fff3a122000)
    libaio.so.1 => /lib/libaio.so.1 (0x00007f18d88c2000)
    libnss3.so.1d => /usr/lib/libnss3.so.1d (0x00007f18d85bf000)
    libnspr4.so.0d => /usr/lib/libnspr4.so.0d (0x00007f18d8380000)
    libpthread.so.0 => /lib/libpthread.so.0 (0x00007f18d8164000)
    libuuid.so.1 => /lib/libuuid.so.1 (0x00007f18d7f60000)
    librt.so.1 => /lib/librt.so.1 (0x00007f18d7d57000)
    libdl.so.2 => /lib/libdl.so.2 (0x00007f18d7b53000)
    libtcmalloc.so.0 => /usr/lib/libtcmalloc.so.0 (0x00007f18d78ec000)
    libboost_thread.so.1.42.0 => /usr/lib/libboost_thread.so.1.42.0 (0x00007f18d76d6000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f18d73c2000)
    libm.so.6 => /lib/libm.so.6 (0x00007f18d7140000)
    libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f18d6f29000)
    libc.so.6 => /lib/libc.so.6 (0x00007f18d6bc8000)
    libnssutil3.so.1d => /usr/lib/libnssutil3.so.1d (0x00007f18d69ac000)
    libplc4.so.0d => /usr/lib/libplc4.so.0d (0x00007f18d67a7000)
    libplds4.so.0d => /usr/lib/libplds4.so.0d (0x00007f18d65a4000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f18d8ad1000)
    libunwind.so.7 => /usr/lib/libunwind.so.7 (0x00007f18d638b000)

So I guess this isn't going to be a quick fix — in particular, we don't want to distribute without tcmalloc as using the standard ptmalloc results in absolutely horrendous memory behavior.

Actions #6

Updated by Greg Farnum about 11 years ago

  • Priority changed from Normal to High
  • Source changed from Development to Community (user)
Actions #7

Updated by Sage Weil about 11 years ago

IIRC this was mostly a problem for ceph-mds. And probably working at all is better than inflated memory usage in the OSD.

Let's just disable tcmalloc for the squeeze build?

Actions #8

Updated by Greg Farnum about 11 years ago

It was most noticeable with the MDS, but I bet we'd see it a lot more with our present OSD design as well now and it wasn't non-existent for them previously, either.
Plus it's working for me, so it'd be nice to figure out the difference and see if we can fix it programmatically.

But if we don't have the time and we think it won't blow up in our faces we can just turn it off for Debian. :)

Actions #9

Updated by Sage Weil about 11 years ago

It's also working great for at least one customer on squeeze. But I don't think we can prioritize digging in given squeeze is in an old release

Actions #10

Updated by Greg Farnum about 11 years ago

Yeah, if it's working for some other people I don't think we want to change the current config without taking the time to dig in.

You can always build your own non-tcmalloc packages if you need them. :) Any chance this is non-x86 or something?

Actions #11

Updated by Ian Colle about 11 years ago

  • Status changed from Need More Info to Won't Fix

Won't fix, as other squeeze users are having success.

Actions

Also available in: Atom PDF