Project

General

Profile

Actions

Bug #994

closed

EOpen reply on non-auth MDS is busted

Added by Greg Farnum about 13 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Saw this while trying to reproduce #966. After restarting the MDSes, one of them crashed with:

2011-04-07 16:26:09.480199 7fabab560710 mds1.log _replay 22018684~749 / 25241314 2011-04-07 16:23:52.460499: EOpen [metablob 20000000e9d, 1 dirs], 1 open files
2011-04-07 16:26:09.480205 7fabab560710 mds1.journal EOpen.replay 
2011-04-07 16:26:09.480210 7fabab560710 mds1.journal EMetaBlob.replay 1 dirlumps by unknown0
2011-04-07 16:26:09.480214 7fabab560710 mds1.journal EMetaBlob.replay dir 20000000e9d
2011-04-07 16:26:09.480221 7fabab560710 mds1.journal EMetaBlob.replay missing dir ino  20000000e9d
mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*)', in thread '0x7fabab560710'
mds/journal.cc: 407: FAILED assert(0)

Logs at kai:~gregf/logs/lost_dir_inode

Actions #1

Updated by Greg Farnum about 13 years ago

  • Status changed from New to In Progress
  • Assignee set to Greg Farnum

Hopefully I can figure this out for .27 -- replay problems are more important than the horde of multi-MDS stuff uncovered by fsstress!

Actions #2

Updated by Greg Farnum about 13 years ago

  • Subject changed from mds not committing object to disk? to EOpen reply on non-auth MDS is busted
  • Target version changed from v0.27 to 12

Ooof, not a simple thing at all:
1) The crashing MDS is non-auth.
2) That means the directory gets trimmed during replay by trim_non_auth_subtrees.
3) Sage thinks journalling EOPEN might be broken in other ways anyway.

So there's just not a simple fix and any hacks would probably introduce more problems than they solve. Pushing this way, way back!

Actions #3

Updated by Greg Farnum about 13 years ago

  • Assignee deleted (Greg Farnum)
Actions #4

Updated by Sage Weil about 13 years ago

  • Status changed from In Progress to Resolved
  • Target version changed from 12 to v0.27

I was wrong about the diagnosis before. Pretty sure commit:777bcba0 fixes this.

As for the non-auth caps, I forgot those are always migrated to the auth mds to avoid situations like the above.

Actions #5

Updated by Sage Weil about 13 years ago

  • Translation missing: en.field_story_points set to 2
  • Translation missing: en.field_position set to 1
  • Translation missing: en.field_position changed from 1 to 624
Actions #6

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
  • Target version deleted (v0.27)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF