Project

General

Profile

Bug #13167

mds: replay gets stuck (on out-of-order journal replies?)

Added by Greg Farnum almost 4 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
09/19/2015
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

ubuntu-2015-09-17_16:55:52-fs-greg-fs-testing---basic-multi/1061690/ceph-mds.a.log

This MDS went in and out of replay a few times, but it got stuck on the last one. It looks like it already has the data it needs to proceed, but it's gotten stuck at the wait condition of MDLog::_replay_thread even so. I do see that the last entry it processed is the last one before a log object boundary (and the next event seems to cross that boundary?). And the second object read completed first.

Associated revisions

Revision f4b55f46 (diff)
Added by Yan, Zheng almost 4 years ago

journaler: detect unexpected holes in journal objects

Fixes: #13167
Signed-off-by: Yan, Zheng <>

History

#1 Updated by Zheng Yan almost 4 years ago

  • Status changed from Verified to Duplicate

Write_pos of journal seems to be pointing to somewhere in object 200.00000002, But size of object 200.00000001 is 3139442. It's likely this is another symptom of #13166

#2 Updated by Greg Farnum almost 4 years ago

  • Status changed from Duplicate to Verified
  • Priority changed from Urgent to Normal

We should be detecting holes in the journal and shutting down with a nice message or clear assert or something instead of just hanging forever.

#4 Updated by Zheng Yan almost 4 years ago

  • Status changed from Verified to Need Review

#5 Updated by Greg Farnum almost 4 years ago

  • Status changed from Need Review to Resolved

#6 Updated by Greg Farnum about 3 years ago

  • Component(FS) MDS added

Also available in: Atom PDF