Project

General

Profile

Actions

Bug #1366

closed

mds segfault

Added by Sam Lang almost 13 years ago. Updated over 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have 4 mds's running in the following setup:

[mds.alpha]
host = 192.168.101.12

[mds.bravo]
host = 192.168.101.13

[mds.charlie]
host = 192.168.101.14
mds standby replay = true
mds standby for name = alpha

[mds.delta]
host = 192.168.101.15
mds standby replay = true
mds standby for name = bravo

After running for a while, I see a segfault on mds.charlie. I think some of this may be due to network connection getting reset on my system, which I'm still trying to figure out, but it looks like ceph is handling these resets up to a point. Here's the end of the charlie log. Let me know if more info/debugging is needed.

2011-08-05 09:15:21.013070 7f79fbbd2700 mds0.objecter FULL, paused modify 0x929b480 tid 75279
2011-08-05 09:15:21.013118 7f79fbbd2700 mds0.objecter FULL, paused modify 0x929b000 tid 75280
2011-08-05 09:15:21.117920 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.114:6824/31082
2011-08-05 09:15:21.118549 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.114:6824/31082
2011-08-05 09:15:42.321481 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.114:6830/31258
2011-08-05 09:15:42.322121 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.114:6830/31258
2011-08-05 09:16:07.017921 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.113:6804/19439
2011-08-05 09:16:07.018804 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.113:6804/19439
2011-08-05 09:16:11.077979 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.11:6829/8222
2011-08-05 09:16:11.078865 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.11:6829/8222
2011-08-05 09:16:36.679306 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.113:6816/19485
2011-08-05 09:16:36.680255 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.113:6816/19485
2011-08-05 09:17:22.481311 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.11:6832/9489
2011-08-05 09:17:22.482177 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.11:6832/9489
2011-08-05 09:17:47.747949 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.113:6825/19531
2011-08-05 09:17:47.748818 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.113:6825/19531
2011-08-05 09:18:13.458029 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.114:6805/30973
2011-08-05 09:18:13.458654 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.114:6805/30973
  • Caught signal (Segmentation fault) *
    in thread 0x7f79fccd5700
    ceph version (commit:)
    1: (ceph::BackTrace::BackTrace(int)+0x2d) [0xac0379]
    2: /usr/ceph/bin/cmds() [0xb383d3]
    3: (()+0xfc60) [0x7f79ffb93c60]
    4: (CInode::pop_and_dirty_projected_inode(LogSegment
    )+0x199) [0x9eb1ed]
    5: (Mutation::pop_and_dirty_projected_inodes()+0x50) [0x897fee]
    6: (Mutation::apply()+0x1b) [0x8980e9]
    7: (C_MDS_mknod_finish::finish(int)+0x134) [0x89b446]
    8: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x1b5) [0x7f5311]
    9: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x528) [0xa85416]
    10: (Journaler::C_Flush::finish(int)+0x32) [0xa8c224]
    11: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x12b4) [0xa477dc]
    12: (MDS::handle_core_message(Message*)+0x936) [0x7ef8ce]
    13: (MDS::_dispatch(Message*)+0x6ac) [0x7f1400]
    14: (MDS::ms_dispatch(Message*)+0x38) [0x7eedbe]
    15: (Messenger::ms_deliver_dispatch(Message*)+0x70) [0xadfe2a]
    16: (SimpleMessenger::dispatch_entry()+0x810) [0xac9e7c]
    17: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x7c5468]
    18: (Thread::_entry_func(void*)+0x23) [0xa8f141]
    19: (()+0x6d8c) [0x7f79ffb8ad8c]
    20: (clone()+0x6d) [0x7f79fe7d804d]

mds config:

Actions

Also available in: Atom PDF