Project

General

Profile

Actions

Bug #1022

closed

every mds crash: Program terminated with signal 11, Segmentation fault.

Added by ar Fred about 13 years ago. Updated over 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A few seconds after startup, all my MDSes crash with the following message:

ceph version .commit: . process: cmds. pid: 6614
2011-04-21 14:53:04.928342 7fad87f50700 mds-1.0 ms_handle_connect on 192.168.20.9:6789/0
2011-04-21 14:53:09.040809 7fad87f50700 mds-1.0 handle_mds_map standby
2011-04-21 14:53:23.122717 7fad87f50700 mds-1.0 handle_mds_map standby
2011-04-21 14:53:38.194972 7fad87f50700 mds0.342 handle_mds_map i am now mds0.342
2011-04-21 14:53:38.194999 7fad87f50700 mds0.342 handle_mds_map state change up:standby --> up:replay
2011-04-21 14:53:38.195008 7fad87f50700 mds0.342 replay_start
2011-04-21 14:53:38.195026 7fad87f50700 mds0.342  recovery set is 
2011-04-21 14:53:38.195035 7fad87f50700 mds0.342  need osdmap epoch 2013, have 2010
2011-04-21 14:53:38.195077 7fad87f50700 mds0.cache handle_mds_failure mds0 : recovery peers are 
2011-04-21 14:53:38.330797 7fad87f50700 mds0.342 ms_handle_connect on 192.168.20.9:6801/3628
2011-04-21 14:53:38.331087 7fad87f50700 mds0.342 ms_handle_connect on 192.168.20.10:6801/3918
2011-04-21 14:53:38.331204 7fad87f50700 mds0.342 ms_handle_connect on 192.168.20.11:6801/3803
2011-04-21 14:53:38.373929 7fad87f50700 mds0.cache creating system inode with ino:100
2011-04-21 14:53:38.374091 7fad87f50700 mds0.cache creating system inode with ino:1
*** Caught signal (Segmentation fault) **
 in thread 0x7f93dabdb700
 ceph version  (commit:)
 1: /usr/bin/cmds() [0x73b691]
 2: (()+0xfc60) [0x7f93df9a3c60]
 3: (ESession::replay(MDS*)+0x6c6) [0x4ecdd6]
 4: (MDLog::_replay_thread()+0x10a1) [0x6935e1]
 5: (MDLog::ReplayThread::entry()+0xd) [0x4d86cd]
 6: (()+0x6d8c) [0x7f93df99ad8c]
 7: (clone()+0x6d) [0x7f93de5e804d]

I just upgraded to ceph v0.26 from ceph 0.24.3.
all OSD and all MON are running fine.

what gdb has to say:

(gdb) bt
#0  0x00007fad8a60db3b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x000000000073b8a4 in handle_fatal_signal (signum=11) at common/signal.cc:78
#2  <signal handler called>
#3  ESession::replay (this=0x2a54900, mds=0x2a57a00) at mds/journal.cc:711
#4  0x00000000006935e1 in MDLog::_replay_thread (this=0x2a5a300) at mds/MDLog.cc:556
#5  0x00000000004d86cd in MDLog::ReplayThread::entry (this=<value optimized out>) at mds/MDLog.h:86
#6  0x00007fad8a604d8c in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7  0x00007fad8925204d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x0000000000000000 in ?? ()

Actions

Also available in: Atom PDF