Bug #3595
closedceph-osd and ceph-mds crash on Debian Squeeze
Added by Jörg Blank over 11 years ago. Updated about 11 years ago.
0%
Description
Using the official packages for 0.48 and 0.55 on Debian Squeeze always leads to a crash of the ceph-osd and ceph-mds daemons during basic setup.
This bug is described at https://code.google.com/p/gperftools/issues/detail?id=304 and was fixed with google-perftools 1.7.
As Debian Squeeze currently ships with version 1.5, perftools should be disabled on this platform.
Updated by Sage Weil over 11 years ago
- Status changed from New to Need More Info
Is this still a problem with the bobtail packages?
Updated by Jörg Blank over 11 years ago
root@cluster:~# ceph-osd
Segmentation fault
root@cluster:~# ceph-osd -h
Segmentation fault
root@cluster:~# ceph-osd
Segmentation fault
Unfortunately the bug is still present.
Updated by Greg Farnum about 11 years ago
I actually run these on Debian pretty often and don't have any issues, so I'm a bit confused. Can you grab a backtrace out of GDB for me?
Updated by Jörg Blank about 11 years ago
dpkg -s ceph
Version: 0.56.2-1~bpo60+1
dpkg -s libgoogle-perftools0
Version: 1.5-1
dpkg -s libtcmalloc-minimal0
Version: 1.5-1
gdb ceph-osd
(gdb) run
Starting program: /usr/bin/ceph-osd
[Thread debugging using libthread_db enabled]
Program received signal SIGSEGV, Segmentation fault.
base::VDSOSupport::ElfMemImage::GetNumSymbols (this=0x7fffffffeb40) at src/base/vdso_support.cc:139
139 src/base/vdso_support.cc: No such file or directory.
in src/base/vdso_support.cc
(gdb) bt
#0 base::VDSOSupport::ElfMemImage::GetNumSymbols (this=0x7fffffffeb40) at src/base/vdso_support.cc:139
#1 0x00007ffff6c37962 in base::VDSOSupport::SymbolIterator::Update (this=0x7fffffffead0, increment=0) at src/base/vdso_support.cc:496
#2 0x00007ffff6c37af5 in base::VDSOSupport::begin (this=<value optimized out>) at src/base/vdso_support.cc:481
#3 0x00007ffff6c381f4 in base::VDSOSupport::LookupSymbol (this=0x7fffffffeb40, name=0x7ffff6c40ca3 "__vdso_getcpu",
version=0xffffffffff700120 <Address 0xffffffffff700120 out of bounds>, type=2, info=0x7fffffffeb90) at src/base/vdso_support.cc:416
#4 0x00007ffff6c38313 in base::VDSOSupport::Init () at src/base/vdso_support.cc:390
#5 0x00007ffff6c39cb6 in __do_global_ctors_aux () from /usr/lib/libtcmalloc.so.0
#6 0x00007ffff6c1d013 in _init () from /usr/lib/libtcmalloc.so.0
#7 0x00007fffffffecb8 in ?? ()
#8 0x00007ffff7decc69 in call_init (l=0x7ffff7ff0980, argc=-152801168, argv=0x7fffffffeca8, env=0x7fffffffecb8) at dl-init.c:70
#9 0x00007ffff7decda7 in _dl_init (main_map=0x7ffff7ffe128, argc=1, argv=0x7fffffffeca8, env=0x7fffffffecb8) at dl-init.c:134
#10 0x00007ffff7ddfb2a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#11 0x0000000000000001 in ?? ()
#12 0x00007fffffffee9f in ?? ()
#13 0x0000000000000000 in ?? ()
Updated by Greg Farnum about 11 years ago
How bizarre:
gregf@kai:~$ dpkg -s ceph | grep Version: Version: 0.56.2-1~bpo60+1 gregf@kai:~$ dpkg -s libgoogle-perftools0 | grep Version: Version: 1.5-1 gregf@kai:~$ dpkg -s libtcmalloc-minimal0 | grep Version: Version: 1.5-1 gregf@kai:~$ lsb_release -d Description: Debian GNU/Linux 6.0.6 (n/a) gregf@kai:~$ gdb ceph-osd GNU gdb (GDB) 7.0.1-debian Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/bin/ceph-osd...(no debugging symbols found)...done. (gdb) run Starting program: /usr/bin/ceph-osd [Thread debugging using libthread_db enabled] [New Thread 0x7ffff56ac710 (LWP 4102)] --conf/-c Read configuration from the given configuration file -d Run in foreground, log to stderr. -f Run in foreground, log to usual location. --id/-i set ID portion of my name --name/-n set name (TYPE.ID) --version show version and quit --debug_ms N set message debug level (e.g. 1) 2013-02-12 12:33:53.829837 7ffff7fe4780 -1 must specify '-i #' where # is the osd number 2013-02-12 12:33:53.829838 7ffff7fe4780 -1 usage: ceph-osd -i osdid [--osd-data=path] [--osd-journal=path] [--mkfs] [--mkjournal] [--convert-filestore] 2013-02-12 12:33:53.829839 7ffff7fe4780 -1 --debug_osd N set debug level (e.g. 10) [Thread 0x7ffff56ac710 (LWP 4102) exited] Program exited with code 01. (gdb)
and
gregf@kai:~$ ldd /usr/bin/ceph-osd linux-vdso.so.1 => (0x00007fff3a122000) libaio.so.1 => /lib/libaio.so.1 (0x00007f18d88c2000) libnss3.so.1d => /usr/lib/libnss3.so.1d (0x00007f18d85bf000) libnspr4.so.0d => /usr/lib/libnspr4.so.0d (0x00007f18d8380000) libpthread.so.0 => /lib/libpthread.so.0 (0x00007f18d8164000) libuuid.so.1 => /lib/libuuid.so.1 (0x00007f18d7f60000) librt.so.1 => /lib/librt.so.1 (0x00007f18d7d57000) libdl.so.2 => /lib/libdl.so.2 (0x00007f18d7b53000) libtcmalloc.so.0 => /usr/lib/libtcmalloc.so.0 (0x00007f18d78ec000) libboost_thread.so.1.42.0 => /usr/lib/libboost_thread.so.1.42.0 (0x00007f18d76d6000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f18d73c2000) libm.so.6 => /lib/libm.so.6 (0x00007f18d7140000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f18d6f29000) libc.so.6 => /lib/libc.so.6 (0x00007f18d6bc8000) libnssutil3.so.1d => /usr/lib/libnssutil3.so.1d (0x00007f18d69ac000) libplc4.so.0d => /usr/lib/libplc4.so.0d (0x00007f18d67a7000) libplds4.so.0d => /usr/lib/libplds4.so.0d (0x00007f18d65a4000) /lib64/ld-linux-x86-64.so.2 (0x00007f18d8ad1000) libunwind.so.7 => /usr/lib/libunwind.so.7 (0x00007f18d638b000)
So I guess this isn't going to be a quick fix — in particular, we don't want to distribute without tcmalloc as using the standard ptmalloc results in absolutely horrendous memory behavior.
Updated by Greg Farnum about 11 years ago
- Priority changed from Normal to High
- Source changed from Development to Community (user)
Updated by Sage Weil about 11 years ago
IIRC this was mostly a problem for ceph-mds. And probably working at all is better than inflated memory usage in the OSD.
Let's just disable tcmalloc for the squeeze build?
Updated by Greg Farnum about 11 years ago
It was most noticeable with the MDS, but I bet we'd see it a lot more with our present OSD design as well now and it wasn't non-existent for them previously, either.
Plus it's working for me, so it'd be nice to figure out the difference and see if we can fix it programmatically.
But if we don't have the time and we think it won't blow up in our faces we can just turn it off for Debian. :)
Updated by Sage Weil about 11 years ago
It's also working great for at least one customer on squeeze. But I don't think we can prioritize digging in given squeeze is in an old release
Updated by Greg Farnum about 11 years ago
Yeah, if it's working for some other people I don't think we want to change the current config without taking the time to dig in.
You can always build your own non-tcmalloc packages if you need them. :) Any chance this is non-x86 or something?
Updated by Ian Colle about 11 years ago
- Status changed from Need More Info to Won't Fix
Won't fix, as other squeeze users are having success.