Project

General

Profile

Actions

Bug #5333

closed

mds: segfault in MDLog::standby_trim_segments

Added by Sage Weil almost 11 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

   -32> 2013-06-12 04:59:07.867788 7fe040c47700  1 -- 10.214.133.35:6801/22779 <== osd.2 10.214.133.30:6808/29475 4 ==== osd_op_reply(12 200.00000007 [read 669~4193635] ack = -2 (No such file or directory)) v4 ==== 111+0+0 (3130234908 0 0) 0x22d7c00 con 0x22f2dc0
   -31> 2013-06-12 04:59:07.867908 7fe040c47700  0 mds.4881.journaler(ro) _finish_read got error -2
   -30> 2013-06-12 04:59:07.867943 7fe03db39700  0 mds.0.log _replay journaler got error -2, aborting
   -29> 2013-06-12 04:59:07.867967 7fe03db39700 10 mds.0.log _replay_thread kicking waiters
   -28> 2013-06-12 04:59:07.867977 7fe03db39700  1 mds.0.0 replay_done (as standby)
   -27> 2013-06-12 04:59:07.867981 7fe03db39700 10 mds.0.0 setting replay timer
   -26> 2013-06-12 04:59:07.867989 7fe03db39700 10 mds.0.log _replay_thread finish
   -25> 2013-06-12 04:59:07.868003 7fe040c47700  5 mds.0.0 ms_handle_connect on 10.214.133.30:6800/29467
   -24> 2013-06-12 04:59:07.868021 7fe040c47700  1 -- 10.214.133.35:6801/22779 <== osd.1 10.214.133.30:6801/29468 3 ==== osd_op_reply(14 200.00000009 [read 0~4194304] ack = -2 (No such file or directory)) v4 ==== 111+0+0 (1127041475 0 0) 0x2329000 con 0x22f29a0
   -23> 2013-06-12 04:59:07.868058 7fe040c47700  0 mds.4881.journaler(ro) _finish_read got error -2
   -22> 2013-06-12 04:59:07.868097 7fe040c47700  5 mds.0.0 ms_handle_connect on 10.214.132.11:6808/21771
   -21> 2013-06-12 04:59:07.868106 7fe040c47700  1 -- 10.214.133.35:6801/22779 <== osd.2 10.214.133.30:6808/29475 5 ==== osd_op_reply(13 200.00000008 [read 0~4194304] ack = -2 (No such file or directory)) v4 ==== 111+0+0 (1039982437 0 0) 0x22d7600 con 0x22f2dc0
   -20> 2013-06-12 04:59:07.868127 7fe040c47700  0 mds.4881.journaler(ro) _finish_read got error -2
   -19> 2013-06-12 04:59:07.981597 7fe040c47700  1 -- 10.214.133.35:6801/22779 <== osd.2 10.214.133.30:6808/29475 6 ==== osd_op_reply(21 200.00000010 [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (2157896658 0 1258194911) 0x22d7800 con 0x22f2dc0
   -18> 2013-06-12 04:59:08.020616 7fe040c47700  1 -- 10.214.133.35:6801/22779 <== osd.0 10.214.133.30:6800/29467 1 ==== osd_op_reply(16 200.0000000b [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (3748363280 0 3189137344) 0x22d7a00 con 0x2356000
   -17> 2013-06-12 04:59:08.022190 7fe040c47700  1 -- 10.214.133.35:6801/22779 <== osd.1 10.214.133.30:6801/29468 4 ==== osd_op_reply(17 200.0000000c [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (2692552363 0 400696248) 0x2329600 con 0x22f29a0
   -16> 2013-06-12 04:59:08.024007 7fe040c47700  1 -- 10.214.133.35:6801/22779 <== osd.5 10.214.132.11:6808/21771 1 ==== osd_op_reply(19 200.0000000e [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (1209428004 0 2062669488) 0x22d7e00 con 0x23562c0
   -15> 2013-06-12 04:59:08.090907 7fe040c47700  1 -- 10.214.133.35:6801/22779 <== osd.0 10.214.133.30:6800/29467 2 ==== osd_op_reply(22 200.00000011 [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (2740054922 0 854091393) 0x22d7600 con 0x2356000
   -14> 2013-06-12 04:59:08.096874 7fe040c47700  1 -- 10.214.133.35:6801/22779 <== osd.5 10.214.132.11:6808/21771 2 ==== osd_op_reply(20 200.0000000f [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (632048689 0 3489315198) 0x22d7a00 con 0x23562c0
   -13> 2013-06-12 04:59:08.497470 7fe040c47700  1 -- 10.214.133.35:6801/22779 <== osd.4 10.214.132.11:6804/21768 4 ==== osd_op_reply(15 200.0000000a [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (4189159031 0 2064788653) 0x2329800 con 0x22f2b00
   -12> 2013-06-12 04:59:08.628129 7fe040c47700  1 -- 10.214.133.35:6801/22779 <== osd.4 10.214.132.11:6804/21768 5 ==== osd_op_reply(18 200.0000000d [read 0~4194304] ondisk = 0) v4 ==== 111+0+4194304 (1493057275 0 456732573) 0x2329600 con 0x22f2b00
   -11> 2013-06-12 04:59:08.868071 7fe03eb42700  1 mds.0.0 standby_replay_restart (as standby)
   -10> 2013-06-12 04:59:08.868137 7fe03eb42700  1 -- 10.214.133.35:6801/22779 --> 10.214.133.30:6808/29475 -- osd_op(mds.4881.0:23 200.00000000 [read 0~0] 1.844f3494 e27) v4 -- ?+0 0x230f480 con 0x22f2dc0
    -9> 2013-06-12 04:59:08.869189 7fe040c47700  1 -- 10.214.133.35:6801/22779 <== osd.2 10.214.133.30:6808/29475 7 ==== osd_op_reply(23 200.00000000 [read 0~84] ondisk = 0) v4 ==== 111+0+84 (821130163 0 3960199946) 0x2329e00 con 0x22f2dc0
    -8> 2013-06-12 04:59:08.869248 7fe040c47700  1 mds.4881.journaler(ro) probing for end of the log
    -7> 2013-06-12 04:59:08.869300 7fe040c47700  1 -- 10.214.133.35:6801/22779 --> 10.214.132.11:6804/21768 -- osd_op(mds.4881.0:24 200.00000029 [stat] 1.8563bd25 rwordered e27) v4 -- ?+0 0x230f900 con 0x22f2b00
    -6> 2013-06-12 04:59:08.869381 7fe040c47700  1 -- 10.214.133.35:6801/22779 --> 10.214.132.11:6808/21771 -- osd_op(mds.4881.0:25 200.0000002a [stat] 1.5bf49b38 rwordered e27) v4 -- ?+0 0x234ab40 con 0x23562c0
    -5> 2013-06-12 04:59:08.870424 7fe040c47700  1 -- 10.214.133.35:6801/22779 <== osd.5 10.214.132.11:6808/21771 3 ==== osd_op_reply(25 200.0000002a [stat] ack = -2 (No such file or directory)) v4 ==== 111+0+0 (3305829669 0 0) 0x22d7800 con 0x23562c0
    -4> 2013-06-12 04:59:09.311965 7fe040c47700  1 -- 10.214.133.35:6801/22779 <== osd.4 10.214.132.11:6804/21768 6 ==== osd_op_reply(24 200.00000029 [stat] ondisk = 0) v4 ==== 111+0+16 (3142556516 0 2765576968) 0x2329000 con 0x22f2b00
    -3> 2013-06-12 04:59:09.312020 7fe040c47700  1 mds.4881.journaler(ro) _finish_reprobe new_end = 172696594 (header had 172385236).
    -2> 2013-06-12 04:59:09.312047 7fe040c47700 10 mds.0.log standby_trim_segments
    -1> 2013-06-12 04:59:09.312049 7fe040c47700 10 mds.0.log  expire_pos=41944712
     0> 2013-06-12 04:59:09.315002 7fe040c47700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fe040c47700

 ceph version 0.63-580-g256afa0 (256afa072dc3af0bf8d7495ed8a62e8b2e84adb0)
 1: ceph-mds() [0x97443a]
 2: (()+0xfcb0) [0x7fe0451c4cb0]
 3: (MDLog::standby_trim_segments()+0x192) [0x78a7d2]
 4: (MDS::C_MDS_StandbyReplayRestartFinish::finish(int)+0x39) [0x595e09]
 5: (Journaler::_finish_reprobe(int, unsigned long, Context*)+0x190) [0x791650]
 6: (Filer::_probed(Filer::Probe*, object_t const&, unsigned long, utime_t)+0x558) [0x7c63e8]
 7: (Objecter::C_Stat::finish(int)+0xc0) [0x7c71e0]
 8: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe48) [0x7b2658]
 9: (MDS::handle_core_message(Message*)+0xae8) [0x5896f8]
 10: (MDS::_dispatch(Message*)+0x2f) [0x5898bf]
 11: (MDS::ms_dispatch(Message*)+0x1d3) [0x58b343]
 12: (DispatchQueue::entry()+0x3f1) [0x941ff1]
 13: (DispatchQueue::DispatchThread::entry()+0xd) [0x86c95d]
 14: (()+0x7e9a) [0x7fe0451bce9a]
 15: (clone()+0x6d) [0x7fe043759ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---

job was
ubuntu@teuthology:/a/teuthology-2013-06-12_01:01:55-fs-master-testing-basic/34852$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 7335df8644be6b42fb84e9854edef6501cbbd9c1
machine_type: plana
nuke-on-error: true
overrides:
  ceph:
    conf:
      client:
        debug client: 10
      global:
        mds inject delay type: osd mds
        ms inject delay max: 1
        ms inject delay probability: 0.005
        ms inject socket failures: 2500
      mds:
        debug mds: 20
        debug ms: 1
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    sha1: 256afa072dc3af0bf8d7495ed8a62e8b2e84adb0
  install:
    ceph:
      sha1: 256afa072dc3af0bf8d7495ed8a62e8b2e84adb0
  s3tests:
    branch: master
  workunit:
    sha1: 256afa072dc3af0bf8d7495ed8a62e8b2e84adb0
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
  - mds.b-s-a
tasks:
- chef: null
- clock.check: null
- install: null
- ceph: null
- mds_thrash: null
- ceph-fuse: null
- workunit:
    clients:
      all:
      - suites/fsstress.sh

archived the logs in the 'logs' subdir

Actions

Also available in: Atom PDF