Actions
Bug #1366
closedmds segfault
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I have 4 mds's running in the following setup:
[mds.alpha]
host = 192.168.101.12
[mds.bravo]
host = 192.168.101.13
[mds.charlie]
host = 192.168.101.14
mds standby replay = true
mds standby for name = alpha
[mds.delta]
host = 192.168.101.15
mds standby replay = true
mds standby for name = bravo
After running for a while, I see a segfault on mds.charlie. I think some of this may be due to network connection getting reset on my system, which I'm still trying to figure out, but it looks like ceph is handling these resets up to a point. Here's the end of the charlie log. Let me know if more info/debugging is needed.
2011-08-05 09:15:21.013070 7f79fbbd2700 mds0.objecter FULL, paused modify 0x929b480 tid 752792011-08-05 09:15:21.013118 7f79fbbd2700 mds0.objecter FULL, paused modify 0x929b000 tid 75280
2011-08-05 09:15:21.117920 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.114:6824/31082
2011-08-05 09:15:21.118549 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.114:6824/31082
2011-08-05 09:15:42.321481 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.114:6830/31258
2011-08-05 09:15:42.322121 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.114:6830/31258
2011-08-05 09:16:07.017921 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.113:6804/19439
2011-08-05 09:16:07.018804 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.113:6804/19439
2011-08-05 09:16:11.077979 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.11:6829/8222
2011-08-05 09:16:11.078865 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.11:6829/8222
2011-08-05 09:16:36.679306 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.113:6816/19485
2011-08-05 09:16:36.680255 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.113:6816/19485
2011-08-05 09:17:22.481311 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.11:6832/9489
2011-08-05 09:17:22.482177 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.11:6832/9489
2011-08-05 09:17:47.747949 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.113:6825/19531
2011-08-05 09:17:47.748818 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.113:6825/19531
2011-08-05 09:18:13.458029 7f79fccd5700 mds0.3 ms_handle_reset on 192.168.101.114:6805/30973
2011-08-05 09:18:13.458654 7f79fccd5700 mds0.3 ms_handle_connect on 192.168.101.114:6805/30973
- Caught signal (Segmentation fault) *
in thread 0x7f79fccd5700
ceph version (commit:)
1: (ceph::BackTrace::BackTrace(int)+0x2d) [0xac0379]
2: /usr/ceph/bin/cmds() [0xb383d3]
3: (()+0xfc60) [0x7f79ffb93c60]
4: (CInode::pop_and_dirty_projected_inode(LogSegment)+0x199) [0x9eb1ed]
5: (Mutation::pop_and_dirty_projected_inodes()+0x50) [0x897fee]
6: (Mutation::apply()+0x1b) [0x8980e9]
7: (C_MDS_mknod_finish::finish(int)+0x134) [0x89b446]
8: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x1b5) [0x7f5311]
9: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x528) [0xa85416]
10: (Journaler::C_Flush::finish(int)+0x32) [0xa8c224]
11: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x12b4) [0xa477dc]
12: (MDS::handle_core_message(Message*)+0x936) [0x7ef8ce]
13: (MDS::_dispatch(Message*)+0x6ac) [0x7f1400]
14: (MDS::ms_dispatch(Message*)+0x38) [0x7eedbe]
15: (Messenger::ms_deliver_dispatch(Message*)+0x70) [0xadfe2a]
16: (SimpleMessenger::dispatch_entry()+0x810) [0xac9e7c]
17: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x7c5468]
18: (Thread::_entry_func(void*)+0x23) [0xa8f141]
19: (()+0x6d8c) [0x7f79ffb8ad8c]
20: (clone()+0x6d) [0x7f79fe7d804d]
mds config:
Actions