Project

General

Profile

Actions

Bug #16924

closed

Crash replaying EExport

Added by John Spray over 7 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

   -34> 2016-08-04 17:25:40.600671 7f4c8b13c700 10 mds.0.log _replay_thread start
   -33> 2016-08-04 17:25:40.600750 7f4c8b13c700  1 -- 192.168.1.13:6813/6779 --> 192.168.1.13:6804/6255 -- osd_op(mds.4118.0:8637 2.1ba6bc1f 200.00000004 [read 1315022~2924 [fadvise_dontneed]] snapc 0=[] ack+read+known_if_redirected+full_force e11) v7 -- ?+0 0xc098840 con 0xba99b00
   -32> 2016-08-04 17:25:40.600838 7f4c8b33e700 10 _calc_signature seq 8635 front_crc_ = 346259589 middle_crc = 0 data_crc = 0 sig = 9525467307877532421
   -31> 2016-08-04 17:25:40.600845 7f4c8b33e700 20 Putting signature in client message(seq # 8635): sig = 9525467307877532421
   -30> 2016-08-04 17:25:40.600903 7f4c8b13c700  1 reusing last freed id 68
   -29> 2016-08-04 17:25:40.601893 7f4c8b23d700 10 _calc_signature seq 8635 front_crc_ = 2485487617 middle_crc = 0 data_crc = 504523221 sig = 7175534316790889072
   -28> 2016-08-04 17:25:40.601914 7f4c8b23d700  1 -- 192.168.1.13:6813/6779 <== osd.1 192.168.1.13:6804/6255 8635 ==== osd_op_reply(8637 200.00000004 [read 1315022~2924 [fadvise_dontneed]] v0'0 uv136 ondisk = 0) v7 ==== 132+0+2924 (2485487617 0 504523221) 0xc4118c0 con 0xba99b00
   -27> 2016-08-04 17:25:40.602105 7f4c8b13c700 10 mds.0.log _replay 18092238~1256 / 18095162 2016-08-04 17:25:39.576059: EExport 10000000000 [metablob 1, 2 dirs]
   -26> 2016-08-04 17:25:40.602120 7f4c8b13c700 10 mds.0.journal EExport.replay 10000000000
   -25> 2016-08-04 17:25:40.602122 7f4c8b13c700 10 mds.0.journal EMetaBlob.replay 2 dirlumps by unknown.0
   -24> 2016-08-04 17:25:40.602127 7f4c8b13c700 10 mds.0.journal EMetaBlob.replay dir 1
   -23> 2016-08-04 17:25:40.602133 7f4c8b13c700 10 mds.0.journal EMetaBlob.replay updated dir [dir 1 / [2,head] auth v=444 cv=0/0 dir_auth=0 state=1610612736 f(v0 m2016-08-04 17:24:31.430676 1=0+1) n(v6 rc2016-08-04 17:25:33.930024 b25257853 2556=2219+337) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 dirty=1 0xbbb6000]
   -22> 2016-08-04 17:25:40.602179 7f4c8b13c700 10 mds.0.journal EMetaBlob.replay for [2,head] had [dentry #1/etc [2,head] auth (dversion lock) v=443 inode=0xbba2ba0 | inodepin=1 dirty=1 0xc392000]
   -21> 2016-08-04 17:25:40.602197 7f4c8b13c700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 10000000000 [...2,head] /etc/ auth v443 f(v0 m2016-08-04 17:25:33.930024 162=34+128) n(v8 rc2016-08-04 17:25:33.930024 b25257853 2556=2219+337) (inest sync dirty) (iversion lock) | dirtyscattered=1 dirfrag=1 dirtyparent=0 dirty=1 0xbba2ba0]
   -20> 2016-08-04 17:25:40.602227 7f4c8b13c700 10 mds.0.journal EMetaBlob.replay dir 10000000000
   -19> 2016-08-04 17:25:40.602233 7f4c8b13c700 10 mds.0.journal EMetaBlob.replay updated dir [dir 10000000000 /etc/ [2,head] auth v=4853 cv=0/0 state=1610612736 f(v0 m2016-08-04 17:25:33.930024 162=34+128) n(v8 rc2016-08-04 17:25:33.930024 b25257853 2555=2219+336) hs=122+0,ss=0+0 dirty=36 | child=1 dirty=1 0xbbb6610]
   -18> 2016-08-04 17:25:40.602252 7f4c8b13c700  7 mds.0.cache adjust_bounded_subtree_auth -1,-2 -> -2,-2 on [dir 10000000000 /etc/ [2,head] auth v=4853 cv=0/0 state=1610612736 f(v0 m2016-08-04 17:25:33.930024 162=34+128) n(v8 rc2016-08-04 17:25:33.930024 b25257853 2555=2219+336) hs=122+0,ss=0+0 dirty=36 | child=1 dirty=1 0xbbb6610] bounds 
   -17> 2016-08-04 17:25:40.602265 7f4c8b13c700 15 mds.0.cache show_subtrees
   -16> 2016-08-04 17:25:40.602277 7f4c8b13c700 10 mds.0.cache |__ 0    auth [dir 100 ~mds0/ [2,head] auth v=1 cv=0/0 dir_auth=0 state=1073741824 f(v0 10=0+10) n(v0 10=0+10) hs=0+0,ss=0+0 | subtree=1 0xbbb6308]
   -15> 2016-08-04 17:25:40.602287 7f4c8b13c700 10 mds.0.cache |__ 0    auth [dir 1 / [2,head] auth v=444 cv=0/0 dir_auth=0 state=1610612736 f(v0 m2016-08-04 17:24:31.430676 1=0+1) n(v6 rc2016-08-04 17:25:33.930024 b25257853 2556=2219+337) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 dirty=1 0xbbb6000]
   -14> 2016-08-04 17:25:40.602300 7f4c8b13c700  7 mds.0.cache  current root is [dir 1 / [2,head] auth v=444 cv=0/0 dir_auth=0 state=1610612736 f(v0 m2016-08-04 17:24:31.430676 1=0+1) n(v6 rc2016-08-04 17:25:33.930024 b25257853 2556=2219+337) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 dirty=1 0xbbb6000]
   -13> 2016-08-04 17:25:40.602310 7f4c8b13c700 10 mds.0.cache   new subtree at [dir 10000000000 /etc/ [2,head] auth v=4853 cv=0/0 state=1610612736 f(v0 m2016-08-04 17:25:33.930024 162=34+128) n(v8 rc2016-08-04 17:25:33.930024 b25257853 2555=2219+336) hs=122+0,ss=0+0 dirty=36 | child=1 dirty=1 0xbbb6610]
   -12> 2016-08-04 17:25:40.602336 7f4c8b13c700 10 mds.0.cache.dir(10000000000) setting dir_auth=-2,-2 from -1,-2 on [dir 10000000000 /etc/ [2,head] auth v=4853 cv=0/0 state=1610612736 f(v0 m2016-08-04 17:25:33.930024 162=34+128) n(v8 rc2016-08-04 17:25:33.930024 b25257853 2555=2219+336) hs=122+0,ss=0+0 dirty=36 | child=1 subtree=1 dirty=1 0xbbb6610]
   -11> 2016-08-04 17:25:40.602362 7f4c8b13c700 10 mds.0.cache.dir(10000000000)  new subtree root, adjusting auth_pins
   -10> 2016-08-04 17:25:40.602382 7f4c8b13c700 15 mds.0.cache show_subtrees
    -9> 2016-08-04 17:25:40.602402 7f4c8b13c700 10 mds.0.cache |____ 0    auth [dir 100 ~mds0/ [2,head] auth v=1 cv=0/0 dir_auth=0 state=1073741824 f(v0 10=0+10) n(v0 10=0+10) hs=0+0,ss=0+0 | subtree=1 0xbbb6308]
    -8> 2016-08-04 17:25:40.602424 7f4c8b13c700 10 mds.0.cache |_.__ 0    auth [dir 1 / [2,head] auth v=444 cv=0/0 dir_auth=0 state=1610612736 f(v0 m2016-08-04 17:24:31.430676 1=0+1) n(v6 rc2016-08-04 17:25:33.930024 b25257853 2556=2219+337) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 dirty=1 0xbbb6000]
    -7> 2016-08-04 17:25:40.602458 7f4c8b13c700 10 mds.0.cache   |__-2    auth [dir 10000000000 /etc/ [2,head] auth v=4853 cv=0/0 dir_auth=-2 state=1610612736 f(v0 m2016-08-04 17:25:33.930024 162=34+128) n(v8 rc2016-08-04 17:25:33.930024 b25257853 2555=2219+336) hs=122+0,ss=0+0 dirty=36 | child=1 subtree=1 dirty=1 0xbbb6610]
    -6> 2016-08-04 17:25:40.602487 7f4c8b13c700 10 mds.0.cache try_trim_nonauth_subtree [dir 10000000000 /etc/ [2,head] auth v=4853 cv=0/0 dir_auth=-2 state=1610612736 f(v0 m2016-08-04 17:25:33.930024 162=34+128) n(v8 rc2016-08-04 17:25:33.930024 b25257853 2555=2219+336) hs=122+0,ss=0+0 dirty=36 | child=1 subtree=1 dirty=1 0xbbb6610]
    -5> 2016-08-04 17:25:40.602505 7f4c8b13c700 10 mds.0.cache trim_non_auth_subtree(0xbbb6610) [dir 10000000000 /etc/ [2,head] auth v=4853 cv=0/0 dir_auth=-2 state=1610612736 f(v0 m2016-08-04 17:25:33.930024 162=34+128) n(v8 rc2016-08-04 17:25:33.930024 b25257853 2555=2219+336) hs=122+0,ss=0+0 dirty=36 | child=1 subtree=1 dirty=1 0xbbb6610]
    -4> 2016-08-04 17:25:40.602529 7f4c8b13c700 10 mds.0.cache trim_non_auth_subtree(0xbbb6610) Checking dentry 0xc71f690
    -3> 2016-08-04 17:25:40.602543 7f4c8b13c700 20 mds.0.cache trim_non_auth_subtree(0xbbb6610) removing inode 0xca3fa70 with dentry0xc71f690
    -2> 2016-08-04 17:25:40.602555 7f4c8b13c700 12 mds.0.cache.dir(10000000000) unlink_inode [dentry #1/etc/pcmcia [2,head] auth (dversion lock) v=1532 inode=0xca3fa70 0xc71f690] [inode 100000004c2 [...2,head] /etc/pcmcia/ auth v1532 f(v0 m2016-08-04 17:24:52.963509 1=1+0) n(v0 rc2016-08-04 17:24:52.963509 b1117 2=1+1) (iversion lock) 0xca3fa70]
    -1> 2016-08-04 17:25:40.602588 7f4c8b13c700 14 mds.0.cache remove_inode [inode 100000004c2 [...2,head] #100000004c2/ auth v1532 f(v0 m2016-08-04 17:24:52.963509 1=1+0) n(v0 rc2016-08-04 17:24:52.963509 b1117 2=1+1) (iversion lock) 0xca3fa70]
     0> 2016-08-04 17:25:40.778768 7f4c8b13c700 -1 /home/jspray/git/ceph/src/mds/MDCache.cc: In function 'bool MDCache::trim_non_auth_subtree(CDir*)' thread 7f4c8b13c700 time 2016-08-04 17:25:40.602607
/home/jspray/git/ceph/src/mds/MDCache.cc: 6869: FAILED assert(!dir->has_bloom())

 ceph version v11.0.0-1132-gfcdb1b1 (fcdb1b1bc1a20b2c9a7637af68fa55db5233994e)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x194d292]
 2: (MDCache::trim_non_auth_subtree(CDir*)+0x80b) [0x157b00b]
 3: (MDCache::try_trim_non_auth_subtree(CDir*)+0x3e8) [0x157b9da]
 4: (EExport::replay(MDSRank*)+0x365) [0x18558a9]
 5: (MDLog::_replay_thread()+0x17e0) [0x1802524]
 6: (MDLog::ReplayThread::entry()+0x1c) [0x1429a4e]
 7: (Thread::entry_wrapper()+0xc1) [0x1aee961]
 8: (Thread::_entry_func(void*)+0x18) [0x1aee896]
 9: (()+0x7555) [0x7f4c986af555]
 10: (clone()+0x6d) [0x7f4c97538ded]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Seeing this in a 2xactive + 2xstandby-replay configuration, while copying in a deep tree (my trusty ol' /etc dir) with the MDS cache size set to 1000.

Seems to reproduce easily.

Actions

Also available in: Atom PDF