Bug #329
closed
mds: mislinked dentry found during journal replay
Added by Sage Weil almost 14 years ago.
Updated over 7 years ago.
Description
There is a FIXME error that is logged during replay when we encounter a dentry that is already linked and a journal entry tries to newly link it to something new. The question is how we got into that state in the first place.
To find the problem, we need full mds logs from when the entry was originally logged, all the way through the failed replay.
Wido has hit this a couple times now with an rsync of kernel.org. The mds needs to be restarted at some point to detect the replay issue.
- Target version set to v0.21.1
- Target version changed from v0.21.1 to v0.21.2
- Target version changed from v0.21.2 to v0.21.3
- Target version changed from v0.21.3 to v0.21.4
This can come up with multiple MDSs. (Wido saw it with one MDS; not sure how that happened.)
With multiple MDSs, the situation can be something like:
- mds0: /a/b > ino1
- export /a from mds0>mds1
- mds1: /a/b relinked to ino2
- export /a from mds1->mds0
- crash
- replay journal
- mds0 replay sees /a/b link to ino1, then ino2
- Target version changed from v0.21.4 to v0.22
I suspect the solution (for the clustered case) is something like:
- trim_non_auth and a subtree when we replay EExport, and when we disambiguate_imports and determine a subtree is non-auth.
- trim_non_auth() should now be a no-op, since any non-auth subtree has already been trimmed. make it warn/assert if it find any work to do.
- trim_unlinked_inodes() should also be a no-op (right?). warn/assert if it's not.
- this should make the current FIXME case not come up, since we won't have any stale subtree content from prior periods of auth-ness.
?
- Target version changed from v0.22 to v0.23
- Assignee set to Greg Farnum
- Status changed from New to Resolved
The multi-mds fix has been pushed to mds_journal branch commit:aa83e11c67165878e1ca1b0fe66ff9b8c3a906c8. Then merged into unstable.
Closing for now. If we get a single-MDS occurrence of the original problem we should probably open a new ticket.
- Project changed from Ceph to CephFS
- Category deleted (
1)
- Target version deleted (
v0.23)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.
Also available in: Atom
PDF