Bug #6458: journaler: journal too short during replay - CephFS - Ceph

Actions

Copy link

Bug #6458

closed

journaler: journal too short during replay

Added by Greg Farnum over 10 years ago. Updated almost 8 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

Correctness/Safety

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS, osdc

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Got a report on irc from a user whose log was 611 bytes shorter than the header indicated it should be. His guess was that it had happened the day before when he restarted the MDS "a couple times" while some OSDs were down.

Checking details:
1) The header object indicated the log should have ended at an object boundary. The last object was 611 bytes short (as evidenced by the object reads in the log, and manual listings he pasted).
2) After the problem began, he ran a deep scrub which turned up clean — the issue was not filesystem corruption/lost writes on a single OSD.
3) The log ended cleanly (except for being shorter than it should have — the last entry was the correct length and there was no extra data.
4) Fixing the header fixed the problem.

I did not gather enough data to disprove it having been degraded to a single copy, having the OSD holding the data lose the last write, and having it recover elsewhere to a different node. That seems less likely to me than some coding issue, though I have been quite unable to find one.

Actions

Copy link

Updated by Greg Farnum over 10 years ago

Status changed from New to Rejected

That is not what happened; the underlying objects were inconsistent in RADOS.

Actions

Copy link

Updated by Greg Farnum over 10 years ago

Status changed from Rejected to New

Urgh, that last comment was mistaken.

Actions

Copy link

Updated by Greg Farnum over 10 years ago

Subject changed from mds: apparently can commit too-new header if some OSDs are down to journaler: flush commits new header to disk without waiting for newest entries to be acked
Description updated (diff)

Actions

Copy link

Updated by Greg Farnum over 10 years ago

Status changed from New to In Progress

This is a bit more complicated than we described — we do not in fact blindly write the write_pos to our head object; we use the "safe pos", which should be maintained correctly by the flush code.

The flush could still be incorrect if separate flushes commit out-of-order, though, which is likely what happened here. Whipping up a patch.

Actions

Copy link

Updated by Greg Farnum over 10 years ago

Status changed from In Progress to Fix Under Review
Assignee set to Sage Weil

Pushed a patch to wip-journaler-safety, commit:a0ba5c66162af720627fcf7ba63fdc76ac97f568. I'm setting up a basic functionality test now to make sure I didn't break anything.

Actions

Copy link