Project

General

Profile

Actions

Bug #6458

closed

journaler: journal too short during replay

Added by Greg Farnum over 10 years ago. Updated almost 8 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, osdc
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Got a report on irc from a user whose log was 611 bytes shorter than the header indicated it should be. His guess was that it had happened the day before when he restarted the MDS "a couple times" while some OSDs were down.

Checking details:
1) The header object indicated the log should have ended at an object boundary. The last object was 611 bytes short (as evidenced by the object reads in the log, and manual listings he pasted).
2) After the problem began, he ran a deep scrub which turned up clean — the issue was not filesystem corruption/lost writes on a single OSD.
3) The log ended cleanly (except for being shorter than it should have — the last entry was the correct length and there was no extra data.
4) Fixing the header fixed the problem.

I did not gather enough data to disprove it having been degraded to a single copy, having the OSD holding the data lose the last write, and having it recover elsewhere to a different node. That seems less likely to me than some coding issue, though I have been quite unable to find one.

Actions

Also available in: Atom PDF