Project

General

Profile

Bug #6458

journaler: journal too short during replay

Added by Greg Farnum almost 6 years ago. Updated about 3 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
Start date:
10/02/2013
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, osdc
Labels (FS):
Pull request ID:

Description

Got a report on irc from a user whose log was 611 bytes shorter than the header indicated it should be. His guess was that it had happened the day before when he restarted the MDS "a couple times" while some OSDs were down.

Checking details:
1) The header object indicated the log should have ended at an object boundary. The last object was 611 bytes short (as evidenced by the object reads in the log, and manual listings he pasted).
2) After the problem began, he ran a deep scrub which turned up clean — the issue was not filesystem corruption/lost writes on a single OSD.
3) The log ended cleanly (except for being shorter than it should have — the last entry was the correct length and there was no extra data.
4) Fixing the header fixed the problem.

I did not gather enough data to disprove it having been degraded to a single copy, having the OSD holding the data lose the last write, and having it recover elsewhere to a different node. That seems less likely to me than some coding issue, though I have been quite unable to find one.

History

#1 Updated by Greg Farnum almost 6 years ago

  • Status changed from New to Rejected

That is not what happened; the underlying objects were inconsistent in RADOS.

#2 Updated by Greg Farnum almost 6 years ago

  • Status changed from Rejected to New

Urgh, that last comment was mistaken.

#3 Updated by Greg Farnum almost 6 years ago

  • Subject changed from mds: apparently can commit too-new header if some OSDs are down to journaler: flush commits new header to disk without waiting for newest entries to be acked
  • Description updated (diff)

#4 Updated by Greg Farnum almost 6 years ago

  • Status changed from New to In Progress

This is a bit more complicated than we described — we do not in fact blindly write the write_pos to our head object; we use the "safe pos", which should be maintained correctly by the flush code.

The flush could still be incorrect if separate flushes commit out-of-order, though, which is likely what happened here. Whipping up a patch.

#5 Updated by Greg Farnum almost 6 years ago

  • Status changed from In Progress to Need Review
  • Assignee set to Sage Weil

Pushed a patch to wip-journaler-safety, commit:a0ba5c66162af720627fcf7ba63fdc76ac97f568. I'm setting up a basic functionality test now to make sure I didn't break anything.

#6 Updated by Greg Farnum almost 6 years ago

Yep, it ran pjd successfully; plenty of journal commits there!

#7 Updated by Greg Farnum almost 6 years ago

Made a pull request too: https://github.com/ceph/ceph/pull/683

#8 Updated by Greg Farnum almost 6 years ago

  • Status changed from Need Review to New
  • Assignee deleted (Sage Weil)

Wow, that explanation of what was going on was so very wrong. Now I'm just not sure how this could have occurred.

#9 Updated by Greg Farnum almost 6 years ago

  • Subject changed from journaler: flush commits new header to disk without waiting for newest entries to be acked to journaler: journal too short during replay
  • Description updated (diff)

#10 Updated by Zheng Yan almost 6 years ago

maybe this is the same as #4708. Two MDS (one mds is supposed to be dead, but it's not ) modified the log at the same time, one zeroed out another MDS's log entries.

#11 Updated by Greg Farnum almost 6 years ago

Hmmm, definitely possible. That ticket isn't exactly a certain diagnosis either, though. :(

#12 Updated by Greg Farnum over 5 years ago

  • Status changed from New to Need More Info

I've bumped up #4708, so if that's the cause of this it'll be fixed when that is. If not, we need more info.

#13 Updated by Greg Farnum about 3 years ago

  • Category changed from 47 to Correctness/Safety
  • Status changed from Need More Info to Can't reproduce
  • Component(FS) MDS, osdc added

The journal format is different now too; this is probably not useful any more.

Also available in: Atom PDF