Bug #994
closed
EOpen reply on non-auth MDS is busted
Added by Greg Farnum about 13 years ago.
Updated over 7 years ago.
Description
Saw this while trying to reproduce #966. After restarting the MDSes, one of them crashed with:
2011-04-07 16:26:09.480199 7fabab560710 mds1.log _replay 22018684~749 / 25241314 2011-04-07 16:23:52.460499: EOpen [metablob 20000000e9d, 1 dirs], 1 open files
2011-04-07 16:26:09.480205 7fabab560710 mds1.journal EOpen.replay
2011-04-07 16:26:09.480210 7fabab560710 mds1.journal EMetaBlob.replay 1 dirlumps by unknown0
2011-04-07 16:26:09.480214 7fabab560710 mds1.journal EMetaBlob.replay dir 20000000e9d
2011-04-07 16:26:09.480221 7fabab560710 mds1.journal EMetaBlob.replay missing dir ino 20000000e9d
mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*)', in thread '0x7fabab560710'
mds/journal.cc: 407: FAILED assert(0)
Logs at kai:~gregf/logs/lost_dir_inode
- Status changed from New to In Progress
- Assignee set to Greg Farnum
Hopefully I can figure this out for .27 -- replay problems are more important than the horde of multi-MDS stuff uncovered by fsstress!
- Subject changed from mds not committing object to disk? to EOpen reply on non-auth MDS is busted
- Target version changed from v0.27 to 12
Ooof, not a simple thing at all:
1) The crashing MDS is non-auth.
2) That means the directory gets trimmed during replay by trim_non_auth_subtrees.
3) Sage thinks journalling EOPEN might be broken in other ways anyway.
So there's just not a simple fix and any hacks would probably introduce more problems than they solve. Pushing this way, way back!
- Assignee deleted (
Greg Farnum)
- Status changed from In Progress to Resolved
- Target version changed from 12 to v0.27
I was wrong about the diagnosis before. Pretty sure commit:777bcba0 fixes this.
As for the non-auth caps, I forgot those are always migrated to the auth mds to avoid situations like the above.
- Translation missing: en.field_story_points set to 2
- Translation missing: en.field_position set to 1
- Translation missing: en.field_position changed from 1 to 624
- Project changed from Ceph to CephFS
- Category deleted (
1)
- Target version deleted (
v0.27)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.
Also available in: Atom
PDF