Actions
Bug #5333
closedmds: segfault in MDLog::standby_trim_segments
Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
-32> 2013-06-12 04:59:07.867788 7fe040c47700 1 -- 10.214.133.35:6801/22779 <== osd.2 10.214.133.30:6808/29475 4 ==== osd_op_reply(12 200.00000007 [read 669~4193635] ack = -2 (No such file or directory)) v4 ==== 111+0+0 (3130234908 0 0) 0x22d7c00 con 0x22f2dc0 -31> 2013-06-12 04:59:07.867908 7fe040c47700 0 mds.4881.journaler(ro) _finish_read got error -2 -30> 2013-06-12 04:59:07.867943 7fe03db39700 0 mds.0.log _replay journaler got error -2, aborting -29> 2013-06-12 04:59:07.867967 7fe03db39700 10 mds.0.log _replay_thread kicking waiters -28> 2013-06-12 04:59:07.867977 7fe03db39700 1 mds.0.0 replay_done (as standby) -27> 2013-06-12 04:59:07.867981 7fe03db39700 10 mds.0.0 setting replay timer -26> 2013-06-12 04:59:07.867989 7fe03db39700 10 mds.0.log _replay_thread finish -25> 2013-06-12 04:59:07.868003 7fe040c47700 5 mds.0.0 ms_handle_connect on 10.214.133.30:6800/29467 -24> 2013-06-12 04:59:07.868021 7fe040c47700 1 -- 10.214.133.35:6801/22779 <== osd.1 10.214.133.30:6801/29468 3 ==== osd_op_reply(14 200.00000009 [read 0~4194304] ack = -2 (No such file or directory)) v4 ==== 111+0+0 (1127041475 0 0) 0x2329000 con 0x22f29a0 -23> 2013-06-12 04:59:07.868058 7fe040c47700 0 mds.4881.journaler(ro) _finish_read got error -2 -22> 2013-06-12 04:59:07.868097 7fe040c47700 5 mds.0.0 ms_handle_connect on 10.214.132.11:6808/21771 -21> 2013-06-12 04:59:07.868106 7fe040c47700 1 -- 10.214.133.35:6801/22779 <== osd.2 10.214.133.30:6808/29475 5 ==== osd_op_reply(13 200.00000008 [read 0~4194304] ack = -2 (No such file or directory)) v4 ==== 111+0+0 (1039982437 0 0) 0x22d7600 con 0x22f2dc0 -20> 2013-06-12 04:59:07.868127 7fe040c47700 0 mds.4881.journaler(ro) _finish_read got error -2 -19> 2013-06-12 04:59:07.981597 7fe040c47700 1 -- 10.214.133.35:6801/22779 <== osd.2 10.214.133.30:6808/29475 6 ==== osd_op_reply(21 200.00000010 [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (2157896658 0 1258194911) 0x22d7800 con 0x22f2dc0 -18> 2013-06-12 04:59:08.020616 7fe040c47700 1 -- 10.214.133.35:6801/22779 <== osd.0 10.214.133.30:6800/29467 1 ==== osd_op_reply(16 200.0000000b [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (3748363280 0 3189137344) 0x22d7a00 con 0x2356000 -17> 2013-06-12 04:59:08.022190 7fe040c47700 1 -- 10.214.133.35:6801/22779 <== osd.1 10.214.133.30:6801/29468 4 ==== osd_op_reply(17 200.0000000c [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (2692552363 0 400696248) 0x2329600 con 0x22f29a0 -16> 2013-06-12 04:59:08.024007 7fe040c47700 1 -- 10.214.133.35:6801/22779 <== osd.5 10.214.132.11:6808/21771 1 ==== osd_op_reply(19 200.0000000e [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (1209428004 0 2062669488) 0x22d7e00 con 0x23562c0 -15> 2013-06-12 04:59:08.090907 7fe040c47700 1 -- 10.214.133.35:6801/22779 <== osd.0 10.214.133.30:6800/29467 2 ==== osd_op_reply(22 200.00000011 [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (2740054922 0 854091393) 0x22d7600 con 0x2356000 -14> 2013-06-12 04:59:08.096874 7fe040c47700 1 -- 10.214.133.35:6801/22779 <== osd.5 10.214.132.11:6808/21771 2 ==== osd_op_reply(20 200.0000000f [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (632048689 0 3489315198) 0x22d7a00 con 0x23562c0 -13> 2013-06-12 04:59:08.497470 7fe040c47700 1 -- 10.214.133.35:6801/22779 <== osd.4 10.214.132.11:6804/21768 4 ==== osd_op_reply(15 200.0000000a [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (4189159031 0 2064788653) 0x2329800 con 0x22f2b00 -12> 2013-06-12 04:59:08.628129 7fe040c47700 1 -- 10.214.133.35:6801/22779 <== osd.4 10.214.132.11:6804/21768 5 ==== osd_op_reply(18 200.0000000d [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (1493057275 0 456732573) 0x2329600 con 0x22f2b00 -11> 2013-06-12 04:59:08.868071 7fe03eb42700 1 mds.0.0 standby_replay_restart (as standby) -10> 2013-06-12 04:59:08.868137 7fe03eb42700 1 -- 10.214.133.35:6801/22779 --> 10.214.133.30:6808/29475 -- osd_op(mds.4881.0:23 200.00000000 [read 0~0] 1.844f3494 e27) v4 -- ?+0 0x230f480 con 0x22f2dc0 -9> 2013-06-12 04:59:08.869189 7fe040c47700 1 -- 10.214.133.35:6801/22779 <== osd.2 10.214.133.30:6808/29475 7 ==== osd_op_reply(23 200.00000000 [read 0~84] ondisk = 0) v4 ==== 111+0+84 (821130163 0 3960199946) 0x2329e00 con 0x22f2dc0 -8> 2013-06-12 04:59:08.869248 7fe040c47700 1 mds.4881.journaler(ro) probing for end of the log -7> 2013-06-12 04:59:08.869300 7fe040c47700 1 -- 10.214.133.35:6801/22779 --> 10.214.132.11:6804/21768 -- osd_op(mds.4881.0:24 200.00000029 [stat] 1.8563bd25 rwordered e27) v4 -- ?+0 0x230f900 con 0x22f2b00 -6> 2013-06-12 04:59:08.869381 7fe040c47700 1 -- 10.214.133.35:6801/22779 --> 10.214.132.11:6808/21771 -- osd_op(mds.4881.0:25 200.0000002a [stat] 1.5bf49b38 rwordered e27) v4 -- ?+0 0x234ab40 con 0x23562c0 -5> 2013-06-12 04:59:08.870424 7fe040c47700 1 -- 10.214.133.35:6801/22779 <== osd.5 10.214.132.11:6808/21771 3 ==== osd_op_reply(25 200.0000002a [stat] ack = -2 (No such file or directory)) v4 ==== 111+0+0 (3305829669 0 0) 0x22d7800 con 0x23562c0 -4> 2013-06-12 04:59:09.311965 7fe040c47700 1 -- 10.214.133.35:6801/22779 <== osd.4 10.214.132.11:6804/21768 6 ==== osd_op_reply(24 200.00000029 [stat] ondisk = 0) v4 ==== 111+0+16 (3142556516 0 2765576968) 0x2329000 con 0x22f2b00 -3> 2013-06-12 04:59:09.312020 7fe040c47700 1 mds.4881.journaler(ro) _finish_reprobe new_end = 172696594 (header had 172385236). -2> 2013-06-12 04:59:09.312047 7fe040c47700 10 mds.0.log standby_trim_segments -1> 2013-06-12 04:59:09.312049 7fe040c47700 10 mds.0.log expire_pos=41944712 0> 2013-06-12 04:59:09.315002 7fe040c47700 -1 *** Caught signal (Segmentation fault) ** in thread 7fe040c47700 ceph version 0.63-580-g256afa0 (256afa072dc3af0bf8d7495ed8a62e8b2e84adb0) 1: ceph-mds() [0x97443a] 2: (()+0xfcb0) [0x7fe0451c4cb0] 3: (MDLog::standby_trim_segments()+0x192) [0x78a7d2] 4: (MDS::C_MDS_StandbyReplayRestartFinish::finish(int)+0x39) [0x595e09] 5: (Journaler::_finish_reprobe(int, unsigned long, Context*)+0x190) [0x791650] 6: (Filer::_probed(Filer::Probe*, object_t const&, unsigned long, utime_t)+0x558) [0x7c63e8] 7: (Objecter::C_Stat::finish(int)+0xc0) [0x7c71e0] 8: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe48) [0x7b2658] 9: (MDS::handle_core_message(Message*)+0xae8) [0x5896f8] 10: (MDS::_dispatch(Message*)+0x2f) [0x5898bf] 11: (MDS::ms_dispatch(Message*)+0x1d3) [0x58b343] 12: (DispatchQueue::entry()+0x3f1) [0x941ff1] 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x86c95d] 14: (()+0x7e9a) [0x7fe0451bce9a] 15: (clone()+0x6d) [0x7fe043759ccd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels ---
job was
ubuntu@teuthology:/a/teuthology-2013-06-12_01:01:55-fs-master-testing-basic/34852$ cat orig.config.yaml kernel: kdb: true sha1: 7335df8644be6b42fb84e9854edef6501cbbd9c1 machine_type: plana nuke-on-error: true overrides: ceph: conf: client: debug client: 10 global: mds inject delay type: osd mds ms inject delay max: 1 ms inject delay probability: 0.005 ms inject socket failures: 2500 mds: debug mds: 20 debug ms: 1 mon: debug mon: 20 debug ms: 20 debug paxos: 20 log-whitelist: - slow request - wrongly marked me down sha1: 256afa072dc3af0bf8d7495ed8a62e8b2e84adb0 install: ceph: sha1: 256afa072dc3af0bf8d7495ed8a62e8b2e84adb0 s3tests: branch: master workunit: sha1: 256afa072dc3af0bf8d7495ed8a62e8b2e84adb0 roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 - mds.b-s-a tasks: - chef: null - clock.check: null - install: null - ceph: null - mds_thrash: null - ceph-fuse: null - workunit: clients: all: - suites/fsstress.sh
archived the logs in the 'logs' subdir
Actions