Project

General

Profile

Bug #15765

Imbalanced journal replay fails to start from correct position

Added by Jason Dillaman almost 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A journal that was heavily populated with individual CLI maintenance operations (two events per epoch) in addition to periods of bench-write resulted in a commit position of the following:

2016-05-06 18:19:21.526289 7fd8f145a700 20 librbd::Journal: client: [id=, commit_position=[positions=[[object_number=9, tag_tid=300, entry_tid=1], [object_number=8, tag_tid=300, entry_tid=0], [object_number=10, tag_tid=200, entry_tid=4334], [object_number=11, tag_tid=200, entry_tid=4331]]], state=connected], image meta: [tag_class=0]

Upon recovery startup of the journal, the state of the journal objects looked like the following:

2016-05-06 18:19:21.625258 7fd8f1c5b700 10 ObjectPlayer: handle_fetch_complete: journal_data.1.10af238e1f29.9, r=0, len=9536911
2016-05-06 18:19:21.630408 7fd8f1c5b700 20 JournalPlayer: skipping committed entry: Entry[tag_tid=300, entry_tid=1, data size=30]
2016-05-06 18:19:21.630410 7fd8f1c5b700 10 JournalPlayer: located next uncommitted entry: Entry[tag_tid=301, entry_tid=1, data size=10270]

2016-05-06 18:19:21.726973 7fd8f1c5b700 10 ObjectPlayer: handle_fetch_complete: journal_data.1.10af238e1f29.8, r=0, len=9578782
2016-05-06 18:19:21.731536 7fd8f1c5b700 20 JournalPlayer: skipping committed entry: Entry[tag_tid=300, entry_tid=0, data size=39]
2016-05-06 18:19:21.731537 7fd8f1c5b700 10 JournalPlayer: located next uncommitted entry: Entry[tag_tid=301, entry_tid=0, data size=10270]

2016-05-06 18:19:21.652091 7fd8f1c5b700 10 ObjectPlayer: handle_fetch_complete: journal_data.1.10af238e1f29.10, r=0, len=6923812
2016-05-06 18:19:21.655132 7fd8f1c5b700 20 JournalPlayer: skipping committed entry: Entry[tag_tid=200, entry_tid=4334, data size=30]
2016-05-06 18:19:21.655133 7fd8f1c5b700 10 JournalPlayer: located next uncommitted entry: Entry[tag_tid=301, entry_tid=2, data size=10270]

2016-05-06 18:19:21.807730 7fd8f1c5b700 10 ObjectPlayer: handle_fetch_complete: journal_data.1.10af238e1f29.11, r=0, len=9520105
2016-05-06 18:19:21.811569 7fd8f1c5b700 20 JournalPlayer: skipping committed entry: Entry[tag_tid=200, entry_tid=4331, data size=30]
2016-05-06 18:19:21.811570 7fd8f1c5b700 10 JournalPlayer: located next uncommitted entry: Entry[tag_tid=301, entry_tid=3, data size=10270]

Upon start of the replay, it incorrectly started from object 10 instead of object 8 which had the next available record:

2016-05-06 18:19:21.811578 7fd8f1c5b700 20 JournalPlayer: verify_playback_ready: new tag 301 detected, adjusting offset to 2
2016-05-06 18:19:21.811623 7fd8ca5fa700 20 librbd::Journal: 0x7fd8cc00ddc0 handle_replay_ready
2016-05-06 18:19:21.811634 7fd8ca5fa700 20 JournalPlayer: try_pop_front
2016-05-06 18:19:21.811640 7fd8ca5fa700 20 JournalPlayer: advance_splay_object: new offset 3
2016-05-06 18:19:21.811645 7fd8ca5fa700 20 JournalMetadata: allocated commit tid: commit_tid=1 [object_num=10, tag_tid=301, entry_tid=2]

2016-05-06 18:19:21.811809 7fd8f145a700 20 JournalPlayer: try_pop_front
2016-05-06 18:19:21.811813 7fd8f145a700 20 JournalPlayer: advance_splay_object: new offset 0
2016-05-06 18:19:21.811814 7fd8f145a700 20 JournalMetadata: allocated commit tid: commit_tid=2 [object_num=11, tag_tid=301, entry_tid=3]

2016-05-06 18:19:21.811881 7fd8f145a700 20 JournalPlayer: try_pop_front
2016-05-06 18:19:21.811884 7fd8f145a700 -1 JournalPlayer: missing prior journal entry: Entry[tag_tid=301, entry_tid=0, data size=10270]
2016-05-06 18:19:21.811887 7fd8f145a700 10 JournalPlayer: notify_complete: replay complete: r=-42

Related issues

Copied to rbd - Backport #15817: jewel: Imbalanced journal replay fails to start from correct position Resolved

History

#1 Updated by Jason Dillaman almost 8 years ago

  • Description updated (diff)

#2 Updated by Jason Dillaman almost 8 years ago

  • Subject changed from Imbalanced journal reply fails to start from correct position to Imbalanced journal replay fails to start from correct position

#3 Updated by Jason Dillaman almost 8 years ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman

#4 Updated by Jason Dillaman almost 8 years ago

  • Status changed from In Progress to Fix Under Review

#5 Updated by Jason Dillaman almost 8 years ago

  • Backport set to jewel

#6 Updated by Jason Dillaman almost 8 years ago

  • Status changed from Fix Under Review to Pending Backport

#7 Updated by Jason Dillaman almost 8 years ago

  • Copied to Backport #15817: jewel: Imbalanced journal replay fails to start from correct position added

#8 Updated by Jason Dillaman almost 8 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF