Bug #5458: mds: standby-replay -> replay takeover does not handle racing expire/trim - CephFS - Ceph

Actions

Copy link

Bug #5458

closed

mds: standby-replay -> replay takeover does not handle racing expire/trim

Added by Sage Weil almost 11 years ago. Updated almost 8 years ago.

Status:

Duplicate

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

not sure this is the right diagnosis since i only looked at this briefly, but:

2013-06-25 10:13:32.334002 7f7d12b3c700 10 mds.0.log standby_trim_segments
2013-06-25 10:13:32.334004 7f7d12b3c700 10 mds.0.log  expire_pos=100666064
2013-06-25 10:13:32.334005 7f7d12b3c700 20 mds.0.log  removed no segments!
2013-06-25 10:13:32.334006 7f7d12b3c700  2 mds.0.18 boot_start 3: replaying mds log
2013-06-25 10:13:32.334008 7f7d12b3c700 10 mds.0.log replay start, from 83886495 to 240903679
2013-06-25 10:13:32.334097 7f7d0f82c700 10 mds.0.log _replay_thread start
2013-06-25 10:13:32.334106 7f7d0f82c700  0 mds.0.log _replay journaler got error -2, aborting
2013-06-25 10:13:32.334108 7f7d0f82c700 10 mds.0.log _replay_thread kicking waiters
2013-06-25 10:13:32.334111 7f7d0f82c700  1 mds.0.18 replay_done (as standby)
2013-06-25 10:13:32.334113 7f7d0f82c700 10 mds.0.18  last replay pass was as a standby; making final pass
2013-06-25 10:13:32.334114 7f7d0f82c700  1 mds.0.18 standby_replay_restart (final takeover pass)
2013-06-25 10:13:32.334116 7f7d0f82c700  1 mds.0.18  waiting for osdmap 22 (which blacklists prior instance)
2013-06-25 10:13:32.334142 7f7d0f82c700  1 -- 10.214.132.3:6801/22420 --> 10.214.133.37:6789/0 -- mon_subscribe({mdsmap=101+,monmap=2+,osdmap=22}) v2 -- ?+0 0x2378700 con 0x2371580
2013-06-25 10:13:32.334166 7f7d0f82c700 10 mds.0.log _replay_thread finish
2013-06-25 10:13:32.335071 7f7d12b3c700  1 -- 10.214.132.3:6801/22420 <== mon.1 10.214.133.37:6789/0 20 ==== osd_map(22..22 src has 1..22) v3 ==== 312+0+0 (3338200242 0 0) 0x238f240 con 0x2371580
2013-06-25 10:13:32.335127 7f7d12b3c700  2 mds.0.18 boot_start 3: replaying mds log
2013-06-25 10:13:32.335132 7f7d12b3c700 10 mds.0.log replay start, from 83886495 to 240903679
2013-06-25 10:13:32.335235 7f7d0f82c700 10 mds.0.log _replay_thread start
2013-06-25 10:13:32.335247 7f7d0f82c700  0 mds.0.log _replay journaler got error -2, aborting
2013-06-25 10:13:32.335239 7f7d12b3c700  1 -- 10.214.132.3:6801/22420 <== mon.1 10.214.133.37:6789/0 21 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (311286367 0 0) 0x2378a80 con 0x2371580
2013-06-25 10:13:32.335250 7f7d0f82c700 10 mds.0.log _replay_thread kicking waiters
2013-06-25 10:13:32.335253 7f7d0f82c700  1 mds.0.18 replay_done
2013-06-25 10:13:32.335255 7f7d0f82c700  1 mds.0.18 making mds journal writeable
2013-06-25 10:13:32.335327 7f7d0f82c700  1 -- 10.214.132.3:6801/22420 --> 10.214.133.37:6804/4449 -- osd_op(mds.0.18:97 200.00000039 [zero 1828351~2365953] 1.dcd4e79a e22) v4 -- ?+0 0x2407000 con 0x2371b00
2013-06-25 10:13:32.335373 7f7d0f82c700  1 -- 10.214.132.3:6801/22420 --> 10.214.133.37:6808/4456 -- osd_op(mds.0.18:98 200.0000003a [delete] 1.548da8f4 e22) v4 -- ?+0 0x2407480 con 0x23719a0
2013-06-25 10:13:32.335455 7f7d0f82c700  1 -- 10.214.132.3:6801/22420 --> 10.214.133.37:6804/4449 -- osd_op(mds.0.18:99 200.0000003b [delete] 1.6fa831ae e22) v4 -- ?+0 0x2407240 con 0x2371b00
2013-06-25 10:13:32.335499 7f7d0f82c700  1 -- 10.214.132.3:6801/22420 --> 10.214.133.37:6808/4456 -- osd_op(mds.0.18:100 200.0000003c [delete] 1.2c189cad e22) v4 -- ?+0 0x2407b40 con 0x23719a0
2013-06-25 10:13:32.335550 7f7d0f82c700  1 -- 10.214.132.3:6801/22420 --> 10.214.133.37:6804/4449 -- osd_op(mds.0.18:101 200.0000003d [delete] 1.14c3ce6b e22) v4 -- ?+0 0x2407900 con 0x2371b00
2013-06-25 10:13:32.335589 7f7d0f82c700  1 -- 10.214.132.3:6801/22420 --> 10.214.131.7:6800/4748 -- osd_op(mds.0.18:102 200.0000003e [delete] 1.48a2d1d8 e22) v4 -- ?+0 0x24076c0 con 0x2371dc0
2013-06-25 10:13:32.335606 7f7d0f82c700  2 mds.0.18 i am alone, moving to state reconnect
2013-06-25 10:13:32.335609 7f7d0f82c700  3 mds.0.18 request_state up:reconnect

that ignored ENOENT looks fishy to me! then a bit later,

 2013-06-25 10:13:33.967955 7f7d12b3c700 10 mds.0.cache.dir(1000000069e) _fetched 0 bytes for [dir 1000000069e /client.0/tmp/fsstress-plana753925/p6/d4/d7/df/ded/ [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741952 f() n() hs=0+0,ss=0+0 | waiter=1 authpin=1 0x52842
d8] want_dn=
2013-06-25 10:13:33.967962 7f7d12b3c700  0 mds.0.cache.dir(1000000069e) _fetched missing object for [dir 1000000069e /client.0/tmp/fsstress-plana753925/p6/d4/d7/df/ded/ [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741952 f() n() hs=0+0,ss=0+0 | waiter=1 authpin=1 
0x52842d8]
2013-06-25 10:13:33.967973 7f7d12b3c700  0 log [ERR] : dir 1000000069e.1000000069e object missing on disk; some files may be lost
2013-06-25 10:13:33.967976 7f7d12b3c700 10 mds.0.cache.dir(1000000069e) pre_dirty 1
2013-06-25 10:13:33.969363 7f7d12b3c700 -1 mds/MDLog.h: In function 'LogSegment* MDLog::get_current_segment()' thread 7f7d12b3c700 time 2013-06-25 10:13:33.967989
mds/MDLog.h: 170: FAILED assert(!segments.empty())

 ceph version 0.64-469-g9ae0ec8 (9ae0ec83dabe37ac15e5165559debdfef7a5f91d)
 1: ceph-mds() [0x71bee8]
 2: (CDir::_fetched(ceph::buffer::list&, std::string const&)+0x269) [0x72ed39]
 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe48) [0x7b2f48]
 4: (MDS::handle_core_message(Message*)+0xae8) [0x589af8]
 5: (MDS::_dispatch(Message*)+0x2f) [0x589cbf]
 6: (MDS::ms_dispatch(Message*)+0x1d3) [0x58b743]
 7: (DispatchQueue::entry()+0x3f1) [0x943c61]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x86e5fd]
 9: (()+0x7e9a) [0x7f7d170b1e9a]
 10: (clone()+0x6d) [0x7f7d1564eccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

run was

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-06-25_01:00:37-fs-next-testing-basic/45578$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 3d740946b3b79d51f07d9a735a5fb77a849f57dd
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      client:
        debug client: 10
      mds:
        debug mds: 20
        debug ms: 1
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    sha1: 9ae0ec83dabe37ac15e5165559debdfef7a5f91d
  install:
    ceph:
      sha1: 9ae0ec83dabe37ac15e5165559debdfef7a5f91d
  s3tests:
    branch: next
  workunit:
    sha1: 9ae0ec83dabe37ac15e5165559debdfef7a5f91d
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
  - mds.b-s-a
tasks:
- chef: null
- clock.check: null
- install: null
- ceph: null
- mds_thrash: null
- ceph-fuse: null
- workunit:
    clients:
      all:
      - suites/fsstress.sh

log is in a logs subdir (hung, not failed, job)