Project

General

Profile

Actions

Bug #16322

closed

ceph mds getting killed for no reason

Added by Joao Castro almost 8 years ago. Updated about 5 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hello,
my ceph mds get killed for no reason (normally they do the active failover).

Log:

ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
1: (()+0x4ecd02) [0x55c747972d02]
2: (()+0x10340) [0x7f5826a05340]
3: (gsignal()+0x39) [0x7f5824e90cc9]
4: (abort()+0x148) [0x7f5824e940d8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x155) [0x7f582579b535]
6: (()+0x5e6d6) [0x7f58257996d6]
7: (()+0x5e703) [0x7f5825799703]
8: (()+0x5e922) [0x7f5825799922]
9: (Server::prepare_new_inode(std::shared_ptr<MDRequestImpl>&, CDir*, inodeno_t, unsigned int, file_layout_t*)+0x200f) [0x55c7476e59af]
10: (Server::handle_client_openc(std::shared_ptr<MDRequestImpl>&)+0xe5b) [0x55c7476fbb0b]
11: (Server::dispatch_client_request(std::shared_ptr<MDRequestImpl>&)+0xaec) [0x55c74771917c]
12: (Server::handle_client_request(MClientRequest*)+0x47f) [0x55c74771968f]
13: (Server::dispatch(Message*)+0x3bb) [0x55c74771d8db]
14: (MDSRank::handle_deferrable_message(Message*)+0x80c) [0x55c7476a3f8c]
15: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x55c7476ad081]
16: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x55c7476ae1d5]
17: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x55c747695f83]
18: (DispatchQueue::entry()+0x78b) [0x55c747b5b6cb]
19: (DispatchQueue::DispatchThread::entry()+0xd) [0x55c747a4a62d]
20: (()+0x8182) [0x7f58269fd182]
21: (clone()+0x6d) [0x7f5824f5447d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #1

Updated by Joao Castro almost 8 years ago

add:

2016-06-15 03:15:51.017714 7f582103f700 -1 ** Caught signal (Aborted) *
in thread 7f582103f700 thread_name:ms_dispatch

Actions #2

Updated by Zheng Yan almost 8 years ago

could you enable coredump and use gdb to check which line causes the crash

Actions #3

Updated by Joao Castro almost 8 years ago

I am not very experience with gdb, sorry. Should I use it in ceph-mds ?
I will paste the whole log (it has a lot of info).
Please let me know if it helps.
Thanks in advance.

Actions #4

Updated by Joao Castro almost 8 years ago

(...)
Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
Reading symbols from /usr/lib/x86_64-linux-gnu/nss/libsoftokn3.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/nss/libsoftokn3.so
Reading symbols from /usr/lib/x86_64-linux-gnu/libsqlite3.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/libsqlite3.so.0
Reading symbols from /usr/lib/x86_64-linux-gnu/nss/libfreebl3.so...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/nss/libfreebl3.so

warning: File "/lib/x86_64-linux-gnu/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
0x00007f07cbfd066b in pthread_join (threadid=139671352723200, thread_return=0x0) at pthread_join.c:99
99 pthread_join.c: No such file or directory.

Actions #5

Updated by Joao Castro almost 8 years ago

kernel: 4.2.0-36-generic

Actions #7

Updated by Zheng Yan almost 8 years ago

Your ceph-mds does not contain debuginfo, please install debuginfo package first. then start ceph-mds manually with coredump enabled.

$ulimit -c unlimited
$/usr/local/bin/ceph-mds -i <mdsid> -c /usr/local/etc/ceph/ceph.conf --cluster ceph -f

kernel will create coredump.xxx file in current directory when ceph-mds crash. Use gdb to check the coredump file
$gdb /usr/local/bin/ceph-mds ./coredump.xxx

inside gdb. type 'bt'. gdb will give us backtrace of the crash

Actions #8

Updated by Loïc Dachary almost 8 years ago

  • Target version deleted (v10.2.2)
Actions #9

Updated by Joao Castro almost 8 years ago

Zheng Yan wrote:

Your ceph-mds does not contain debuginfo, please install debuginfo package first. then start ceph-mds manually with coredump enabled.

$ulimit -c unlimited
$/usr/local/bin/ceph-mds -i <mdsid> -c /usr/local/etc/ceph/ceph.conf --cluster ceph -f

kernel will create coredump.xxx file in current directory when ceph-mds crash. Use gdb to check the coredump file
$gdb /usr/local/bin/ceph-mds ./coredump.xxx

inside gdb. type 'bt'. gdb will give us backtrace of the crash

Ok, I installed it.
Any way to check if it was properly installed?
Both daemons are now running, I will wait for them to crash and let you know.
Thanks

Actions #10

Updated by Zheng Yan almost 8 years ago

$gdb /usr/local/bin/ceph-mds

If gdb does not say "no debugging symbols found", the debug package is properly installed.

Actions #11

Updated by Greg Farnum almost 8 years ago

  • Status changed from New to Need More Info
Actions #12

Updated by Patrick Donnelly about 5 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF