Project

General

Profile

Actions

Bug #13167

closed

mds: replay gets stuck (on out-of-order journal replies?)

Added by Greg Farnum over 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ubuntu-2015-09-17_16:55:52-fs-greg-fs-testing---basic-multi/1061690/ceph-mds.a.log

This MDS went in and out of replay a few times, but it got stuck on the last one. It looks like it already has the data it needs to proceed, but it's gotten stuck at the wait condition of MDLog::_replay_thread even so. I do see that the last entry it processed is the last one before a log object boundary (and the next event seems to cross that boundary?). And the second object read completed first.

Actions #1

Updated by Zheng Yan over 8 years ago

  • Status changed from 12 to Duplicate

Write_pos of journal seems to be pointing to somewhere in object 200.00000002, But size of object 200.00000001 is 3139442. It's likely this is another symptom of #13166

Actions #2

Updated by Greg Farnum over 8 years ago

  • Status changed from Duplicate to 12
  • Priority changed from Urgent to Normal

We should be detecting holes in the journal and shutting down with a nice message or clear assert or something instead of just hanging forever.

Actions #4

Updated by Zheng Yan over 8 years ago

  • Status changed from 12 to Fix Under Review
Actions #5

Updated by Greg Farnum over 8 years ago

  • Status changed from Fix Under Review to Resolved
Actions #6

Updated by Greg Farnum almost 8 years ago

  • Component(FS) MDS added
Actions

Also available in: Atom PDF