Bug #1041
closedstandby-replay fails on multi-mds fsstress journals
0%
Description
Things break, figure out why.
Updated by Greg Farnum almost 13 years ago
- Subject changed from standby-replay fails on mds journals to standby-replay fails on multi-mds fsstress journals
Updated by Greg Farnum almost 13 years ago
I've got a log in kai:~gregf/logs/fsstress/standby-replay
Updated by Greg Farnum almost 13 years ago
- Status changed from New to In Progress
The problem is that the journal (for mds0) refers to mds1's stray directory. It's replaying a rename operation, where the srci is in mds1's stray dir but the srcdn is not. The inode was kept in the stray dir because when it got moved there, srcdn was on mds0. But it got exported to mds1, which makes me think that the inode shouldn't live in the stray dir any longer and that's the bug?
On the other hand I'm not sure what would happen if the srcdn was still on mds0 and the srci was still in mds1's stray dir. Maybe the journal should just be able to handle stray dirs on other MDSes (though Sage says it shouldn't).
Updated by Sage Weil almost 13 years ago
- Translation missing: en.field_position set to 379
Updated by Sage Weil almost 13 years ago
- Translation missing: en.field_story_points set to 3
- Translation missing: en.field_position deleted (
380) - Translation missing: en.field_position set to 380
Updated by Greg Farnum almost 13 years ago
Back from vacation, and I'm trying to remember what's still broken here. Looking through my logs:
1) MDS 1 gets request to rename, as it's auth on srcdn
2) srci is located on mds 0
3) mds 1 requests and auth pin from mds 0 for srci
4) mds 0 is now a slave for the op and journals extra crap that it's not auth for.
Similar but not identical to the previous cause, which we dealt with by fixing up some of our branching code.
Updated by Sage Weil almost 13 years ago
- Target version changed from v0.29 to v0.30
Updated by Sage Weil almost 13 years ago
- Translation missing: en.field_position deleted (
390) - Translation missing: en.field_position set to 7
Updated by Greg Farnum almost 13 years ago
- Status changed from In Progress to 7
All right, I went over _rename_prepare pretty carefully and reworked a lot of the checks on journaling and now i haven't seen a crash in a while. Running a few more tests with the next branch (and Sage's changes there) merged before I push.
Updated by Greg Farnum almost 13 years ago
- Status changed from 7 to Resolved
Okay, after 3 or 4 more runs I've only seen #1128.
Updated by John Spray over 7 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1) - Target version deleted (
v0.30)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.