Actions
Bug #1022
closedevery mds crash: Program terminated with signal 11, Segmentation fault.
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
A few seconds after startup, all my MDSes crash with the following message:
ceph version .commit: . process: cmds. pid: 6614 2011-04-21 14:53:04.928342 7fad87f50700 mds-1.0 ms_handle_connect on 192.168.20.9:6789/0 2011-04-21 14:53:09.040809 7fad87f50700 mds-1.0 handle_mds_map standby 2011-04-21 14:53:23.122717 7fad87f50700 mds-1.0 handle_mds_map standby 2011-04-21 14:53:38.194972 7fad87f50700 mds0.342 handle_mds_map i am now mds0.342 2011-04-21 14:53:38.194999 7fad87f50700 mds0.342 handle_mds_map state change up:standby --> up:replay 2011-04-21 14:53:38.195008 7fad87f50700 mds0.342 replay_start 2011-04-21 14:53:38.195026 7fad87f50700 mds0.342 recovery set is 2011-04-21 14:53:38.195035 7fad87f50700 mds0.342 need osdmap epoch 2013, have 2010 2011-04-21 14:53:38.195077 7fad87f50700 mds0.cache handle_mds_failure mds0 : recovery peers are 2011-04-21 14:53:38.330797 7fad87f50700 mds0.342 ms_handle_connect on 192.168.20.9:6801/3628 2011-04-21 14:53:38.331087 7fad87f50700 mds0.342 ms_handle_connect on 192.168.20.10:6801/3918 2011-04-21 14:53:38.331204 7fad87f50700 mds0.342 ms_handle_connect on 192.168.20.11:6801/3803 2011-04-21 14:53:38.373929 7fad87f50700 mds0.cache creating system inode with ino:100 2011-04-21 14:53:38.374091 7fad87f50700 mds0.cache creating system inode with ino:1 *** Caught signal (Segmentation fault) ** in thread 0x7f93dabdb700 ceph version (commit:) 1: /usr/bin/cmds() [0x73b691] 2: (()+0xfc60) [0x7f93df9a3c60] 3: (ESession::replay(MDS*)+0x6c6) [0x4ecdd6] 4: (MDLog::_replay_thread()+0x10a1) [0x6935e1] 5: (MDLog::ReplayThread::entry()+0xd) [0x4d86cd] 6: (()+0x6d8c) [0x7f93df99ad8c] 7: (clone()+0x6d) [0x7f93de5e804d]
I just upgraded to ceph v0.26 from ceph 0.24.3.
all OSD and all MON are running fine.
what gdb has to say:
(gdb) bt #0 0x00007fad8a60db3b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x000000000073b8a4 in handle_fatal_signal (signum=11) at common/signal.cc:78 #2 <signal handler called> #3 ESession::replay (this=0x2a54900, mds=0x2a57a00) at mds/journal.cc:711 #4 0x00000000006935e1 in MDLog::_replay_thread (this=0x2a5a300) at mds/MDLog.cc:556 #5 0x00000000004d86cd in MDLog::ReplayThread::entry (this=<value optimized out>) at mds/MDLog.h:86 #6 0x00007fad8a604d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #7 0x00007fad8925204d in clone () from /lib/x86_64-linux-gnu/libc.so.6 #8 0x0000000000000000 in ?? ()
Actions