Bug #16322
closedceph mds getting killed for no reason
0%
Description
Hello,
my ceph mds get killed for no reason (normally they do the active failover).
Log:
ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
1: (()+0x4ecd02) [0x55c747972d02]
2: (()+0x10340) [0x7f5826a05340]
3: (gsignal()+0x39) [0x7f5824e90cc9]
4: (abort()+0x148) [0x7f5824e940d8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x155) [0x7f582579b535]
6: (()+0x5e6d6) [0x7f58257996d6]
7: (()+0x5e703) [0x7f5825799703]
8: (()+0x5e922) [0x7f5825799922]
9: (Server::prepare_new_inode(std::shared_ptr<MDRequestImpl>&, CDir*, inodeno_t, unsigned int, file_layout_t*)+0x200f) [0x55c7476e59af]
10: (Server::handle_client_openc(std::shared_ptr<MDRequestImpl>&)+0xe5b) [0x55c7476fbb0b]
11: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0xaec) [0x55c74771917c]
12: (Server::handle_client_request(MClientRequest*)+0x47f) [0x55c74771968f]
13: (Server::dispatch(Message*)+0x3bb) [0x55c74771d8db]
14: (MDSRank::handle_deferrable_message(Message*)+0x80c) [0x55c7476a3f8c]
15: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x55c7476ad081]
16: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x55c7476ae1d5]
17: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x55c747695f83]
18: (DispatchQueue::entry()+0x78b) [0x55c747b5b6cb]
19: (DispatchQueue::DispatchThread::entry()+0xd) [0x55c747a4a62d]
20: (()+0x8182) [0x7f58269fd182]
21: (clone()+0x6d) [0x7f5824f5447d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Joao Castro almost 8 years ago
add:
2016-06-15 03:15:51.017714 7f582103f700 -1 ** Caught signal (Aborted) *
in thread 7f582103f700 thread_name:ms_dispatch
Updated by Zheng Yan almost 8 years ago
could you enable coredump and use gdb to check which line causes the crash
Updated by Joao Castro almost 8 years ago
I am not very experience with gdb, sorry. Should I use it in ceph-mds ?
I will paste the whole log (it has a lot of info).
Please let me know if it helps.
Thanks in advance.
Updated by Joao Castro almost 8 years ago
(...)
Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
Reading symbols from /usr/lib/x86_64-linux-gnu/nss/libsoftokn3.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/nss/libsoftokn3.so
Reading symbols from /usr/lib/x86_64-linux-gnu/libsqlite3.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/libsqlite3.so.0
Reading symbols from /usr/lib/x86_64-linux-gnu/nss/libfreebl3.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/nss/libfreebl3.so
warning: File "/lib/x86_64-linux-gnu/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
0x00007f07cbfd066b in pthread_join (threadid=139671352723200, thread_return=0x0) at pthread_join.c:99
99 pthread_join.c: No such file or directory.
Updated by Joao Castro almost 8 years ago
Updated by Zheng Yan almost 8 years ago
Your ceph-mds does not contain debuginfo, please install debuginfo package first. then start ceph-mds manually with coredump enabled.
$ulimit -c unlimited
$/usr/local/bin/ceph-mds -i <mdsid> -c /usr/local/etc/ceph/ceph.conf --cluster ceph -f
kernel will create coredump.xxx file in current directory when ceph-mds crash. Use gdb to check the coredump file
$gdb /usr/local/bin/ceph-mds ./coredump.xxx
inside gdb. type 'bt'. gdb will give us backtrace of the crash
Updated by Joao Castro almost 8 years ago
Zheng Yan wrote:
Your ceph-mds does not contain debuginfo, please install debuginfo package first. then start ceph-mds manually with coredump enabled.
$ulimit -c unlimited
$/usr/local/bin/ceph-mds -i <mdsid> -c /usr/local/etc/ceph/ceph.conf --cluster ceph -fkernel will create coredump.xxx file in current directory when ceph-mds crash. Use gdb to check the coredump file
$gdb /usr/local/bin/ceph-mds ./coredump.xxxinside gdb. type 'bt'. gdb will give us backtrace of the crash
Ok, I installed it.
Any way to check if it was properly installed?
Both daemons are now running, I will wait for them to crash and let you know.
Thanks
Updated by Zheng Yan almost 8 years ago
$gdb /usr/local/bin/ceph-mds
If gdb does not say "no debugging symbols found", the debug package is properly installed.
Updated by Greg Farnum almost 8 years ago
- Status changed from New to Need More Info
Updated by Patrick Donnelly about 5 years ago
- Status changed from Need More Info to Can't reproduce