journaler: journal too short during replay
Got a report on irc from a user whose log was 611 bytes shorter than the header indicated it should be. His guess was that it had happened the day before when he restarted the MDS "a couple times" while some OSDs were down.
1) The header object indicated the log should have ended at an object boundary. The last object was 611 bytes short (as evidenced by the object reads in the log, and manual listings he pasted).
2) After the problem began, he ran a deep scrub which turned up clean — the issue was not filesystem corruption/lost writes on a single OSD.
3) The log ended cleanly (except for being shorter than it should have — the last entry was the correct length and there was no extra data.
4) Fixing the header fixed the problem.
I did not gather enough data to disprove it having been degraded to a single copy, having the OSD holding the data lose the last write, and having it recover elsewhere to a different node. That seems less likely to me than some coding issue, though I have been quite unable to find one.
- Status changed from New to Rejected
That is not what happened; the underlying objects were inconsistent in RADOS.
- Status changed from Rejected to New
Urgh, that last comment was mistaken.
- Subject changed from mds: apparently can commit too-new header if some OSDs are down to journaler: flush commits new header to disk without waiting for newest entries to be acked
- Description updated (diff)
- Status changed from New to In Progress
This is a bit more complicated than we described — we do not in fact blindly write the write_pos to our head object; we use the "safe pos", which should be maintained correctly by the flush code.
The flush could still be incorrect if separate flushes commit out-of-order, though, which is likely what happened here. Whipping up a patch.
- Status changed from In Progress to Need Review
- Assignee set to Sage Weil
Pushed a patch to wip-journaler-safety, commit:a0ba5c66162af720627fcf7ba63fdc76ac97f568. I'm setting up a basic functionality test now to make sure I didn't break anything.
Yep, it ran pjd successfully; plenty of journal commits there!
- Status changed from Need Review to New
- Assignee deleted (
Wow, that explanation of what was going on was so very wrong. Now I'm just not sure how this could have occurred.
- Subject changed from journaler: flush commits new header to disk without waiting for newest entries to be acked to journaler: journal too short during replay
- Description updated (diff)
maybe this is the same as #4708. Two MDS (one mds is supposed to be dead, but it's not ) modified the log at the same time, one zeroed out another MDS's log entries.
Hmmm, definitely possible. That ticket isn't exactly a certain diagnosis either, though. :(
- Status changed from New to Need More Info
I've bumped up #4708, so if that's the cause of this it'll be fixed when that is. If not, we need more info.
- Category changed from 47 to Correctness/Safety
- Status changed from Need More Info to Can't reproduce
- Component(FS) MDS, osdc added
The journal format is different now too; this is probably not useful any more.
Also available in: Atom