Bug #6458
closedjournaler: journal too short during replay
0%
Description
Got a report on irc from a user whose log was 611 bytes shorter than the header indicated it should be. His guess was that it had happened the day before when he restarted the MDS "a couple times" while some OSDs were down.
Checking details:
1) The header object indicated the log should have ended at an object boundary. The last object was 611 bytes short (as evidenced by the object reads in the log, and manual listings he pasted).
2) After the problem began, he ran a deep scrub which turned up clean — the issue was not filesystem corruption/lost writes on a single OSD.
3) The log ended cleanly (except for being shorter than it should have — the last entry was the correct length and there was no extra data.
4) Fixing the header fixed the problem.
I did not gather enough data to disprove it having been degraded to a single copy, having the OSD holding the data lose the last write, and having it recover elsewhere to a different node. That seems less likely to me than some coding issue, though I have been quite unable to find one.
Updated by Greg Farnum over 10 years ago
- Status changed from New to Rejected
That is not what happened; the underlying objects were inconsistent in RADOS.
Updated by Greg Farnum over 10 years ago
- Status changed from Rejected to New
Urgh, that last comment was mistaken.
Updated by Greg Farnum over 10 years ago
- Subject changed from mds: apparently can commit too-new header if some OSDs are down to journaler: flush commits new header to disk without waiting for newest entries to be acked
- Description updated (diff)
Updated by Greg Farnum over 10 years ago
- Status changed from New to In Progress
This is a bit more complicated than we described — we do not in fact blindly write the write_pos to our head object; we use the "safe pos", which should be maintained correctly by the flush code.
The flush could still be incorrect if separate flushes commit out-of-order, though, which is likely what happened here. Whipping up a patch.
Updated by Greg Farnum over 10 years ago
- Status changed from In Progress to Fix Under Review
- Assignee set to Sage Weil
Pushed a patch to wip-journaler-safety, commit:a0ba5c66162af720627fcf7ba63fdc76ac97f568. I'm setting up a basic functionality test now to make sure I didn't break anything.
Updated by Greg Farnum over 10 years ago
Yep, it ran pjd successfully; plenty of journal commits there!
Updated by Greg Farnum over 10 years ago
Made a pull request too: https://github.com/ceph/ceph/pull/683
Updated by Greg Farnum over 10 years ago
- Status changed from Fix Under Review to New
- Assignee deleted (
Sage Weil)
Wow, that explanation of what was going on was so very wrong. Now I'm just not sure how this could have occurred.
Updated by Greg Farnum over 10 years ago
- Subject changed from journaler: flush commits new header to disk without waiting for newest entries to be acked to journaler: journal too short during replay
- Description updated (diff)
Updated by Zheng Yan over 10 years ago
maybe this is the same as #4708. Two MDS (one mds is supposed to be dead, but it's not ) modified the log at the same time, one zeroed out another MDS's log entries.
Updated by Greg Farnum over 10 years ago
Hmmm, definitely possible. That ticket isn't exactly a certain diagnosis either, though. :(
Updated by Greg Farnum about 10 years ago
- Status changed from New to Need More Info
I've bumped up #4708, so if that's the cause of this it'll be fixed when that is. If not, we need more info.
Updated by Greg Farnum almost 8 years ago
- Category changed from 47 to Correctness/Safety
- Status changed from Need More Info to Can't reproduce
- Component(FS) MDS, osdc added
The journal format is different now too; this is probably not useful any more.