Project

General

Profile

Bug #4248

mds: replay does not correctly update CInode::first and ::last members

Added by Greg Farnum about 11 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This came in over the mailing list.

2013-02-21 19:10:51.606584 7fc59f036700 10 mds.0.journal EMetaBlob.replay dir 600
2013-02-21 19:10:51.606586 7fc59f036700 10 mds.0.journal EMetaBlob.replay updated dir [dir 600 ~mds0/stray0/ [2,head] auth v=1843468 cv=0/0 state=1610612736 f(v2 m2013-02-14 14:32:31.832960 4=1+3) 
n(v2 rc2013-02-14 14:32:31.832960 b4289386496 4=1+3) hs=1+2735,ss=0+0 dirty=2736 | child dirty 0x1ca2fa0]
2013-02-21 19:10:51.606601 7fc59f036700 12 mds.0.cache.dir(600) add_null_dentry [dentry #100/stray0/10000070420 [4,head] auth NULL (dversion lock) pv=0 v=1843468 inode=0 0xd0a8168]
2013-02-21 19:10:51.606606 7fc59f036700 10 mds.0.journal EMetaBlob.replay added [dentry #100/stray0/10000070420 [4,head] auth NULL (dversion lock) v=1843467 inode=0 | dirty 0xd0a8168]
2013-02-21 19:10:51.606610 7fc59f036700 10 mds.0.journal EMetaBlob.replay unlinking [inode 10000070420 [2,head] /trompete/CentOS-6.3-x86_64-bin-DVD1.iso auth v1843466 s=4289386496 n(v0 b4289386496 
1=1+0) (iversion lock) | dirty 0x9245fa0]
2013-02-21 19:10:51.606616 7fc59f036700 12 mds.0.cache.dir(10000000000) unlink_inode [dentry #1/trompete/CentOS-6.3-x86_64-bin-DVD1.iso [2,head] auth (dversion lock) v=1843466 inode=0x9245fa0 | ino
depin dirty 0xd0a7db8] [inode 10000070420 [2,head] /trompete/CentOS-6.3-x86_64-bin-DVD1.iso auth v1843466 s=4289386496 n(v0 b4289386496 1=1+0) (iversion lock) | dirty 0x9245fa0]
2013-02-21 19:10:51.606624 7fc59f036700 12 mds.0.cache.dir(600) link_primary_inode [dentry #100/stray0/10000070420 [4,head] auth NULL (dversion lock) v=1843467 inode=0 | dirty 0xd0a8168] [inode 100
00070420 [2,head] #10000070420 auth v1843467 s=4289386496 nl=0 n(v0 b4289386496 1=1+0) (iversion lock) | dirty 0x9245fa0]
2013-02-21 19:10:51.606634 7fc59f036700 10 mds.0.journal EMetaBlob.replay linked [inode 10000070420 [2,head] ~mds0/stray0/10000070420 auth v1843467 s=4289386496 nl=0 n(v0 b4289386496 1=1+0) (iversion lock) | dirty 0x9245fa0]
2013-02-21 19:10:51.607281 7fc59f036700 -1 mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)' thread 7fc59f036700 time 2013-02-21 19:10:51.606653
mds/journal.cc: 597: FAILED assert(in->first == p->dnfirst || (in->is_multiversion() && in->first > p->dnfirst))

 ceph version 0.57 (9a7a9d06c0623ccc116a1d3b71c765c20a17e98e)
 1: (EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)+0x1e34) [0x4e0d94]
 2: (EUpdate::replay(MDS*)+0x3a) [0x4e8e1a]
 3: (MDLog::_replay_thread()+0x438) [0x6ad448]
 4: (MDLog::ReplayThread::entry()+0xd) [0x4cc0ad]
 5: (()+0x7e9a) [0x7fc5a7132e9a]
 6: (clone()+0x6d) [0x7fc5a5fd3cbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

It turns out to be because the CInode already existed in the cache with a first of 2, and the replay function didn't update that. The fix is of course to update it — doing so based on the old_inodes values should be safe. I'm about to push a fix for testing.

History

#1 Updated by Greg Farnum about 11 years ago

  • Status changed from New to Fix Under Review

wip-4248-snapid-journaling

Also pushing out a v0.57-based branch to the user to try out.

#2 Updated by Greg Farnum about 11 years ago

  • Status changed from Fix Under Review to In Progress

Okay, that didn't fix the problem. The stray dir now was snapids [3, HEAD] instead of [4, HEAD], but the inode remains at [2, HEAD]. WTF?

2013-02-25 07:33:43.661408 7f887d096700 10 mds.0.journal EMetaBlob.replay dir 600
2013-02-25 07:33:43.661411 7f887d096700 10 mds.0.cache force_dir_fragment * on [inode 600 [...3,head] ~mds0/stray0/ auth v2 f(v0 m2013-02-25 00:16:56.185179 1=1+0) n(v0 rc2013-02-25 00:16:56.185179
 b1536634880 2=1+1)/n(v0 1=0+1) (iversion lock) | dirty 0x1cb39e0]
2013-02-25 07:33:43.661429 7f887d096700 10 mds.0.cache force_dir_fragment no frags under *
2013-02-25 07:33:43.661436 7f887d096700 10 mds.0.journal EMetaBlob.replay added dir [dir 600 ~mds0/stray0/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 0x1d22be8]
2013-02-25 07:33:43.661445 7f887d096700 10 mds.0.journal EMetaBlob.replay updated dir [dir 600 ~mds0/stray0/ [2,head] auth v=190 cv=0/0 state=1073741824 f(v0 m2013-02-25 00:16:56.185179 1=1+0) n(v0
 rc2013-02-25 00:16:56.185179 b1536634880 1=1+0) hs=0+0,ss=0+0 0x1d22be8]
2013-02-25 07:33:43.661468 7f887d096700 12 mds.0.cache.dir(600) add_null_dentry [dentry #100/stray0/10000000002 [3,head] auth NULL (dversion lock) pv=0 v=190 inode=0 0x1d2f760]
2013-02-25 07:33:43.661480 7f887d096700 10 mds.0.journal EMetaBlob.replay added [dentry #100/stray0/10000000002 [3,head] auth NULL (dversion lock) v=189 inode=0 | dirty 0x1d2f760]
2013-02-25 07:33:43.661487 7f887d096700 10 mds.0.journal EMetaBlob.replay unlinking [inode 10000000002 [2,head] /thorhaue/CentOS-6.3-x86_64-bin-DVD2.iso auth v188 s=1536634880 n(v0 b1536634880 1=1+
0) (iversion lock) cr={8700=0-4294967296@1} | dirty 0x1cb3180]
2013-02-25 07:33:43.661506 7f887d096700 12 mds.0.cache.dir(10000000000) unlink_inode [dentry #1/thorhaue/CentOS-6.3-x86_64-bin-DVD2.iso [2,head] auth (dversion lock) v=188 inode=0x1cb3180 | inodepi
n dirty 0x1d2f3b0] [inode 10000000002 [2,head] /thorhaue/CentOS-6.3-x86_64-bin-DVD2.iso auth v188 s=1536634880 n(v0 b1536634880 1=1+0) (iversion lock) cr={8700=0-4294967296@1} | dirty 0x1cb3180]
2013-02-25 07:33:43.661526 7f887d096700 12 mds.0.cache.dir(600) link_primary_inode [dentry #100/stray0/10000000002 [3,head] auth NULL (dversion lock) v=189 inode=0 | dirty 0x1d2f760] [inode 1000000
0002 [2,head] #10000000002 auth v189 s=1536634880 nl=0 n(v0 b1536634880 1=1+0) (iversion lock) cr={8700=0-4294967296@1} | dirty 0x1cb3180]
2013-02-25 07:33:43.661545 7f887d096700 10 mds.0.journal EMetaBlob.replay linked [inode 10000000002 [2,head] ~mds0/stray0/10000000002 auth v189 s=1536634880 nl=0 n(v0 b1536634880 1=1+0) (iversion l
ock) cr={8700=0-4294967296@1} | dirty 0x1cb3180]
2013-02-25 07:33:43.662822 7f887d096700 -1 mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)' thread 7f887d096700 time 2013-02-25 07:33:43.661570
mds/journal.cc: 602: FAILED assert(in->first == p->dnfirst || (in->is_multiversion() && in->first > p->dnfirst))

#3 Updated by Greg Farnum about 11 years ago

This isn't currently journaled correctly, in addition to not setting its snapid ranges correctly. There's nothing that updates old_inodes on an unlink.
I'm going to have to put this on hold for today, though

#4 Updated by Sage Weil about 11 years ago

  • Assignee deleted (Greg Farnum)

#5 Updated by Greg Farnum about 10 years ago

  • Status changed from In Progress to 12

I'm going to leave this at normal even though it's a snapshotting issue — the problem's diagnosed and it's a bug in the journaling that we need to fix.

#6 Updated by Zheng Yan almost 9 years ago

  • Status changed from 12 to Resolved

#7 Updated by Zheng Yan almost 9 years ago

by commit 1479300a68209688b97c60210b37f330f178819c

#8 Updated by Greg Farnum over 7 years ago

  • Component(FS) MDS added

Also available in: Atom PDF