Actions
Bug #542
closedmds journal corruption
Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I saw this on the playground.
THe last bit of the replay log:
2010-11-03 17:12:07.537284 7ff471efb910 mds0.log _replay 305779633766~1329 / 305779686331 : EUpdate cap update [metablob 1000068a5b8, 2 dirs] 2010-11-03 17:12:07.537308 7ff471efb910 mds0.journal EMetaBlob.replay 2 dirlumps 2010-11-03 17:12:07.537317 7ff471efb910 mds0.journal EMetaBlob.replay dir 1000068a5b8 2010-11-03 17:12:07.537329 7ff471efb910 mds0.cache.dir(1000068a5b8) mark_dirty (already dirty) [dir 1000068a5b8 /playground/grows/jbarratt/ [2,head] auth v=2163964 cv=0/0 state=1610612736 f(v19 m2010-10-13 12:45:01.374418 7=4+3) n(v30114 rc2010-11-03 16:05:32.174192 b1352196617 52076=52070+6)/n(v30114 rc2010-11-03 16:05:32.174192 b1346560217 52076=52070+6) hs=1+0,ss=0+0 dirty=1 | child dirty 0x1748840] version 2163964 2010-11-03 17:12:07.537352 7ff471efb910 mds0.journal EMetaBlob.replay dirty nestinfo on [dir 1000068a5b8 /playground/grows/jbarratt/ [2,head] auth v=2163964 cv=0/0 state=1610612736 f(v19 m2010-10-13 12:45:01.374418 7=4+3) n(v30114 rc2010-11-03 16:05:32.174192 b1352196617 52076=52070+6)/n(v30114 rc2010-11-03 16:05:32.174192 b1346560217 52076=52070+6) hs=1+0,ss=0+0 dirty=1 | child dirty 0x1748840] 2010-11-03 17:12:07.537374 7ff471efb910 mds0.locker mark_updated_scatterlock (inest sync dirty) - already on list since 2010-11-03 17:11:21.350505 2010-11-03 17:12:07.537384 7ff471efb910 mds0.journal EMetaBlob.replay updated dir [dir 1000068a5b8 /playground/grows/jbarratt/ [2,head] auth v=2163964 cv=0/0 state=1610612736 f(v19 m2010-10-13 12:45:01.374418 7=4+3) n(v30114 rc2010-11-03 16:05:32.174192 b1352196617 52076=52070+6)/n(v30114 rc2010-11-03 16:05:32.174192 b1346560217 52076=52070+6) hs=1+0,ss=0+0 dirty=1 | child dirty 0x1748840] 2010-11-03 17:12:07.537423 7ff471efb910 mds0.journal EMetaBlob.replay for [2,head] had [dentry #1/playground/grows/jbarratt/test_data [2,head] auth (dversion lock) v=2163963 inode=0x1720aa8 | inodepin dirty 0x1755570] 2010-11-03 17:12:07.537443 7ff471efb910 mds0.journal EMetaBlob.replay for [2,head] had [inode 100006e5fa0 [...2,head] /playground/grows/jbarratt/test_data/ auth v2163963 f(v656394 m2010-11-03 16:05:32.174192 52066=52066+0) n(v1069721 rc2010-11-03 16:05:32.174192 b1352195804 52067=52066+1) (iversion lock) | dirfrag dirty 0x1720aa8] 2010-11-03 17:12:07.537465 7ff471efb910 mds0.journal EMetaBlob.replay dir 100006e5fa0 2010-11-03 17:12:07.537477 7ff471efb910 mds0.journal EMetaBlob.replay updated dir [dir 100006e5fa0 /playground/grows/jbarratt/test_data/ [2,head] auth v=2388257 cv=0/0 state=1073741824 f(v0 m2010-11-03 16:05:32.174192 52066=52066+0)/f(v656394 m2010-11-03 16:05:32.174192 52066=52066+0) n(v1069721 rc2010-11-03 16:05:32.174192 b1352195804 52066=52066+0) hs=87419+4083,ss=0+0 dirty=91502 | child 0x1748e50] 2010-11-03 17:12:07.537515 7ff471efb910 mds0.journal EMetaBlob.replay for [2,head] had [dentry #1/playground/grows/jbarratt/test_data/largefile98323 [2,head] auth (dversion lock) v=2388256 inode=0x19c53c38 | inodepin dirty 0x2039e610] 2010-11-03 17:12:07.537537 7ff471efb910 mds0.journal EMetaBlob.replay for [2,head] had [inode 20001069b5c [2,head] /playground/grows/jbarratt/test_data/largefile98323 auth v2388256 s=6600 n(v0 b6600 1=1+0) (iversion lock) | dirty 0x19c53c38] 2010-11-03 17:12:07.537566 7ff471efb910 7ff471efb910 uh oh, unknown log event type 1953719668 mds/LogEvent.cc: In function 'static LogEvent* LogEvent::decode(ceph::bufferlist&)': mds/LogEvent.cc:77: FAILED assert(0) ceph version 0.23~rc (commit:87d59cd868bb6d469115571635ecf375ed845569) 1: (LogEvent::decode(ceph::buffer::list&)+0x514) [0x97b8fc] 2: (MDLog::_replay_thread()+0x3be) [0x9650f0] 3: (MDLog::ReplayThread::entry()+0x1c) [0x765a06] 4: (Thread::_entry_func(void*)+0x23) [0x74396d] 5: /lib/libpthread.so.0 [0x7ff47774873a] 6: (clone()+0x6d) [0x7ff47670a69d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Notes:
last valid: 305779633766 1329 offset 22ee66 0022ee60 00 00 00 00 00 00[31 05 00 00 14 00 00 00 01 0a |......1.........| 0022ee70 00 00 00 63 61 70 20 75 70 64 61 74 65 02 02 00 |...cap update...| 0022ee80 00 00 b8 a5 68 00 00 01 00 00 00 00 00 00 a0 5f |....h.........._| next: 305779635099 offset 22f39b 0022f390 00 00 00 00 00 00 00 00 00 00 00[09 00 00 00 74 |...............t| 0022f3a0 65 73 74 5f 64 61 74 61 02 00 00 00 00 00 00 00 |est_data........| 0022f3b0 fe ff ff ff ff ff ff ff b5 05 21 00 00 00 00 00 |..........!.....| 0022f3c0 03 a0 5f 6e 00 00 01 00 00 00 00 00 00 8c c0 d1 |.._n............| 0022f3d0 4c 80 f5 61 0a ed 41 00 00 15 34 ae 00 30 83 0b |L..a..A...4..0..| ... gibberish! next event we see that looks good: 305779636141 offset 22f7ad 0022f7a0 00 00 00 00 00 00 00 00 00 00 00 00 00[31 05 00 |.............1..| 0022f7b0 00 14 00 00 00 01 0a 00 00 00 63 61 70 20 75 70 |..........cap up| 0022f7c0 64 61 74 65 02 02 00 00 00 b8 a5 68 00 00 01 00 |date.......h....| 0022f7d0 00 00 00 00 00 a0 5f 6e 00 00 01 00 00 00 00 00 |......_n........| 0022f7e0 00 02 00 00 00 b8 a5 68 00 00 01 00 00 00 00 00 |.......h........|
The hexdump of this whole region:
* 0022ee60 00 00 00 00 00 00 31 05 00 00 14 00 00 00 01 0a |......1.........| 0022ee70 00 00 00 63 61 70 20 75 70 64 61 74 65 02 02 00 |...cap update...| 0022ee80 00 00 b8 a5 68 00 00 01 00 00 00 00 00 00 a0 5f |....h.........._| 0022ee90 6e 00 00 01 00 00 00 00 00 00 02 00 00 00 b8 a5 |n...............| 0022eea0 68 00 00 01 00 00 00 00 00 00 01 01 fc 04 21 00 |h.............!.| 0022eeb0 00 00 00 00 00 00 00 00 00 00 00 00 01 13 00 00 |................| 0022eec0 00 00 00 00 00 0d e2 b5 4c 50 2a 51 16 04 00 00 |........LP*Q....| 0022eed0 00 00 00 00 00 03 00 00 00 00 00 00 00 01 13 00 |................| 0022eee0 00 00 00 00 00 00 0d e2 b5 4c 50 2a 51 16 04 00 |.........LP*Q...| 0022eef0 00 00 00 00 00 00 03 00 00 00 00 00 00 00 01 a2 |................| 0022ef00 75 00 00 00 00 00 00 09 e2 98 50 00 00 00 00 66 |u.........P....f| 0022ef10 cb 00 00 00 00 00 00 06 00 00 00 00 00 00 00 00 |................| 0022ef20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8c |................| 0022ef30 c0 d1 4c 80 f5 61 0a 01 a2 75 00 00 00 00 00 00 |..L..a...u......| 0022ef40 d9 e0 42 50 00 00 00 00 66 cb 00 00 00 00 00 00 |..BP....f.......| 0022ef50 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022ef60 00 00 00 00 00 00 00 00 8c c0 d1 4c 80 f5 61 0a |...........L..a.| 0022ef70 04 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 |................| 0022ef80 69 01 00 00 01 00 00 00 02 09 00 00 00 74 65 73 |i............tes| 0022ef90 74 5f 64 61 74 61 02 00 00 00 00 00 00 00 fe ff |t_data..........| 0022efa0 ff ff ff ff ff ff fb 04 21 00 00 00 00 00 03 a0 |........!.......| 0022efb0 5f 6e 00 00 01 00 00 00 00 00 00 8c c0 d1 4c 80 |_n............L.| 0022efc0 f5 61 0a ed 41 00 00 15 34 ae 00 30 83 0b 00 01 |.a..A...4..0....| 0022efd0 00 00 00 00 00 00 40 00 01 00 00 00 00 00 40 00 |......@.......@.| 0022efe0 00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00 |................| 0022eff0 00 00 00 00 00 00 00 00 01 00 00 00 ff ff ff ff |................| 0022f000 ff ff ff ff 00 00 00 00 00 00 00 00 8c c0 d1 4c |...............L| 0022f010 80 f5 61 0a 0d e2 b5 4c 50 2a 51 16 00 00 00 00 |..a....LP*Q.....| 0022f020 00 00 00 00 01 0a 04 0a 00 00 00 00 00 8c c0 d1 |................| 0022f030 4c 80 f5 61 0a 62 cb 00 00 00 00 00 00 00 00 00 |L..a.b..........| 0022f040 00 00 00 00 00 01 99 52 10 00 00 00 00 00 dc de |.......R........| 0022f050 98 50 00 00 00 00 62 cb 00 00 00 00 00 00 01 00 |.P....b.........| 0022f060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022f070 00 00 00 00 00 00 8c c0 d1 4c 80 f5 61 0a 01 99 |.........L..a...| 0022f080 52 10 00 00 00 00 00 dc de 98 50 00 00 00 00 62 |R.........P....b| 0022f090 cb 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 |................| 0022f0a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8c |................| 0022f0b0 c0 d1 4c 80 f5 61 0a fb 04 21 00 00 00 00 00 00 |..L..a...!......| 0022f0c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0022f0e0 00 00 00 00 01 00 00 00 00 00 00 00 00 a0 5f 6e |.............._n| 0022f0f0 00 00 01 00 00 00 00 00 00 01 01 21 71 24 00 00 |...........!q$..| 0022f100 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 |................| 0022f110 00 00 00 00 8c c0 d1 4c 80 f5 61 0a 62 cb 00 00 |.......L..a.b...| 0022f120 00 00 00 00 00 00 00 00 00 00 00 00 01 0a 04 0a |................| 0022f130 00 00 00 00 00 8c c0 d1 4c 80 f5 61 0a 62 cb 00 |........L..a.b..| 0022f140 00 00 00 00 00 00 00 00 00 00 00 00 00 01 99 52 |...............R| 0022f150 10 00 00 00 00 00 dc de 98 50 00 00 00 00 62 cb |.........P....b.| 0022f160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022f170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8c c0 |................| 0022f180 d1 4c 80 f5 61 0a 01 99 52 10 00 00 00 00 00 dc |.L..a...R.......| 0022f190 de 98 50 00 00 00 00 62 cb 00 00 00 00 00 00 00 |..P....b........| 0022f1a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022f1b0 00 00 00 00 00 00 00 8c c0 d1 4c 80 f5 61 0a 00 |..........L..a..| 0022f1c0 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 65 |...............e| 0022f1d0 01 00 00 01 00 00 00 02 0e 00 00 00 6c 61 72 67 |............larg| 0022f1e0 65 66 69 6c 65 39 38 33 32 33 02 00 00 00 00 00 |efile98323......| 0022f1f0 00 00 fe ff ff ff ff ff ff ff 20 71 24 00 00 00 |.......... q$...| 0022f200 00 00 03 5c 9b 06 01 00 02 00 00 00 00 00 00 54 |...\...........T| 0022f210 c0 d1 4c d0 37 31 35 a4 81 00 00 15 34 ae 00 30 |..L.715.....4..0| 0022f220 83 0b 00 01 00 00 00 00 00 00 40 00 01 00 00 00 |..........@.....| 0022f230 00 00 40 00 00 00 00 00 00 00 00 00 ff ff ff ff |..@.............| 0022f240 00 00 00 00 c8 19 00 00 00 00 00 00 01 00 00 00 |................| 0022f250 ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 |................| 0022f260 54 c0 d1 4c 22 ee 05 36 54 c0 d1 4c d0 37 31 35 |T..L"..6T..L.715| 0022f270 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................| 0022f280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022f290 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 |................| 0022f2a0 00 00 c8 19 00 00 00 00 00 00 01 00 00 00 00 00 |................| 0022f2b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0022f2d0 00 00 01 00 00 00 00 00 00 00 00 c8 19 00 00 00 |................| 0022f2e0 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022f2f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022f300 00 00 00 00 00 00 00 00 00 00 00 20 71 24 00 00 |........... q$..| 0022f310 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022f320 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 |................| 0022f330 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0022f390 00 00 00 00 00 00 00 00 00 00 00 09 00 00 00 74 |...............t| 0022f3a0 65 73 74 5f 64 61 74 61 02 00 00 00 00 00 00 00 |est_data........| 0022f3b0 fe ff ff ff ff ff ff ff b5 05 21 00 00 00 00 00 |..........!.....| 0022f3c0 03 a0 5f 6e 00 00 01 00 00 00 00 00 00 8c c0 d1 |.._n............| 0022f3d0 4c 80 f5 61 0a ed 41 00 00 15 34 ae 00 30 83 0b |L..a..A...4..0..| 0022f3e0 00 01 00 00 00 00 00 00 40 00 01 00 00 00 00 00 |........@.......| 0022f3f0 40 00 00 00 00 00 00 00 00 00 ff ff ff ff 00 00 |@...............| 0022f400 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ff ff |................| 0022f410 ff ff ff ff ff ff 00 00 00 00 00 00 00 00 8c c0 |................| 0022f420 d1 4c 80 f5 61 0a 0d e2 b5 4c 50 2a 51 16 00 00 |.L..a....LP*Q...| 0022f430 00 00 00 00 00 00 01 0a 04 0a 00 00 00 00 00 8c |................| 0022f440 c0 d1 4c 80 f5 61 0a 62 cb 00 00 00 00 00 00 00 |..L..a.b........| 0022f450 00 00 00 00 00 00 00 01 f6 52 10 00 00 00 00 00 |.........R......| 0022f460 dc de 98 50 00 00 00 00 62 cb 00 00 00 00 00 00 |...P....b.......| 0022f470 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022f480 00 00 00 00 00 00 00 00 8c c0 d1 4c 80 f5 61 0a |...........L..a.| 0022f490 01 f6 52 10 00 00 00 00 00 dc de 98 50 00 00 00 |..R.........P...| 0022f4a0 00 62 cb 00 00 00 00 00 00 01 00 00 00 00 00 00 |.b..............| 0022f4b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022f4c0 00 8c c0 d1 4c 80 f5 61 0a b5 05 21 00 00 00 00 |....L..a...!....| 0022f4d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0022f4f0 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 a0 |................| 0022f500 5f 6e 00 00 01 00 00 00 00 00 00 01 01 39 71 24 |_n...........9q$| 0022f510 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 |................| 0022f520 00 00 00 00 00 00 8c c0 d1 4c 80 f5 61 0a 62 cb |.........L..a.b.| 0022f530 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 0a |................| 0022f540 04 0a 00 00 00 00 00 8c c0 d1 4c 80 f5 61 0a 62 |..........L..a.b| 0022f550 cb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 |................| 0022f560 f6 52 10 00 00 00 00 00 dc de 98 50 00 00 00 00 |.R.........P....| 0022f570 62 cb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |b...............| 0022f580 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022f590 8c c0 d1 4c 80 f5 61 0a 01 f6 52 10 00 00 00 00 |...L..a...R.....| 0022f5a0 00 dc de 98 50 00 00 00 00 62 cb 00 00 00 00 00 |....P....b......| 0022f5b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022f5c0 00 00 00 00 00 00 00 00 00 8c c0 d1 4c 80 f5 61 |............L..a| 0022f5d0 0a 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 |................| 0022f5e0 00 65 01 00 00 01 00 00 00 02 0e 00 00 00 6c 61 |.e............la| 0022f5f0 72 67 65 66 69 6c 65 39 38 32 38 31 02 00 00 00 |rgefile98281....| 0022f600 00 00 00 00 fe ff ff ff ff ff ff ff 38 71 24 00 |............8q$.| 0022f610 00 00 00 00 03 32 9b 06 01 00 02 00 00 00 00 00 |.....2..........| 0022f620 00 53 c0 d1 4c 40 53 d7 35 a4 81 00 00 15 34 ae |.S..L@S.5.....4.| 0022f630 00 30 83 0b 00 01 00 00 00 00 00 00 40 00 01 00 |.0..........@...| 0022f640 00 00 00 00 40 00 00 00 00 00 00 00 00 00 ff ff |....@...........| 0022f650 ff ff 00 00 00 00 c8 19 00 00 00 00 00 00 01 00 |................| 0022f660 00 00 ff ff ff ff ff ff ff ff 00 00 00 00 00 00 |................| 0022f670 00 00 53 c0 d1 4c 13 c2 fd 36 53 c0 d1 4c 40 53 |..S..L...6S..L@S| 0022f680 d7 35 00 00 00 00 00 00 00 00 01 00 00 00 00 00 |.5..............| 0022f690 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022f6a0 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 |................| 0022f6b0 00 00 00 00 c8 19 00 00 00 00 00 00 01 00 00 00 |................| 0022f6c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0022f6e0 00 00 00 00 01 00 00 00 00 00 00 00 00 c8 19 00 |................| 0022f6f0 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 |................| 0022f700 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022f710 00 00 00 00 00 00 00 00 00 00 00 00 00 38 71 24 |.............8q$| 0022f720 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0022f740 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 0022f750 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0022f7a0 00 00 00 00 00 00 00 00 00 00 00 00 00 31 05 00 |.............1..| 0022f7b0 00 14 00 00 00 01 0a 00 00 00 63 61 70 20 75 70 |..........cap up| 0022f7c0 64 61 74 65 02 02 00 00 00 b8 a5 68 00 00 01 00 |date.......h....| 0022f7d0 00 00 00 00 00 a0 5f 6e 00 00 01 00 00 00 00 00 |......_n........| 0022f7e0 00 02 00 00 00 b8 a5 68 00 00 01 00 00 00 00 00 |.......h........|
Interestingly, this was all originally written in one op:
2010-11-03 16:08:13.684537 7f84f5069910 -- 10.14.0.125:6803/13597 --> osd1 10.14.0.105:6800/6526 -- osd_op(mds0.106:11016 200.00011cc7 [write 708885~1625400] 1.c4c4) v1 -- ?+0 0x120bd80
which is object offset (in hex) ad115 to 239e4d, capturing this whole region. So the corruption is happening on the mds side (MDLog, JOurnaler), in teh msgr, or on the osd. Probably not the OSD/Filestore: the object is identical on the primary and replica. Probably not the msgr, since we crc check. That leaves either between the msgr and osd, or the MDLog/Journaler/Objecter. My guess is the mds.
See also #478.. probably the same root cause?
Files
Actions