Bug #15765
Imbalanced journal replay fails to start from correct position
Status:
Resolved
Priority:
Normal
Assignee:
Jason Dillaman
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
A journal that was heavily populated with individual CLI maintenance operations (two events per epoch) in addition to periods of bench-write resulted in a commit position of the following:
2016-05-06 18:19:21.526289 7fd8f145a700 20 librbd::Journal: client: [id=, commit_position=[positions=[[object_number=9, tag_tid=300, entry_tid=1], [object_number=8, tag_tid=300, entry_tid=0], [object_number=10, tag_tid=200, entry_tid=4334], [object_number=11, tag_tid=200, entry_tid=4331]]], state=connected], image meta: [tag_class=0]
Upon recovery startup of the journal, the state of the journal objects looked like the following:
2016-05-06 18:19:21.625258 7fd8f1c5b700 10 ObjectPlayer: handle_fetch_complete: journal_data.1.10af238e1f29.9, r=0, len=9536911 2016-05-06 18:19:21.630408 7fd8f1c5b700 20 JournalPlayer: skipping committed entry: Entry[tag_tid=300, entry_tid=1, data size=30] 2016-05-06 18:19:21.630410 7fd8f1c5b700 10 JournalPlayer: located next uncommitted entry: Entry[tag_tid=301, entry_tid=1, data size=10270] 2016-05-06 18:19:21.726973 7fd8f1c5b700 10 ObjectPlayer: handle_fetch_complete: journal_data.1.10af238e1f29.8, r=0, len=9578782 2016-05-06 18:19:21.731536 7fd8f1c5b700 20 JournalPlayer: skipping committed entry: Entry[tag_tid=300, entry_tid=0, data size=39] 2016-05-06 18:19:21.731537 7fd8f1c5b700 10 JournalPlayer: located next uncommitted entry: Entry[tag_tid=301, entry_tid=0, data size=10270] 2016-05-06 18:19:21.652091 7fd8f1c5b700 10 ObjectPlayer: handle_fetch_complete: journal_data.1.10af238e1f29.10, r=0, len=6923812 2016-05-06 18:19:21.655132 7fd8f1c5b700 20 JournalPlayer: skipping committed entry: Entry[tag_tid=200, entry_tid=4334, data size=30] 2016-05-06 18:19:21.655133 7fd8f1c5b700 10 JournalPlayer: located next uncommitted entry: Entry[tag_tid=301, entry_tid=2, data size=10270] 2016-05-06 18:19:21.807730 7fd8f1c5b700 10 ObjectPlayer: handle_fetch_complete: journal_data.1.10af238e1f29.11, r=0, len=9520105 2016-05-06 18:19:21.811569 7fd8f1c5b700 20 JournalPlayer: skipping committed entry: Entry[tag_tid=200, entry_tid=4331, data size=30] 2016-05-06 18:19:21.811570 7fd8f1c5b700 10 JournalPlayer: located next uncommitted entry: Entry[tag_tid=301, entry_tid=3, data size=10270]
Upon start of the replay, it incorrectly started from object 10 instead of object 8 which had the next available record:
2016-05-06 18:19:21.811578 7fd8f1c5b700 20 JournalPlayer: verify_playback_ready: new tag 301 detected, adjusting offset to 2 2016-05-06 18:19:21.811623 7fd8ca5fa700 20 librbd::Journal: 0x7fd8cc00ddc0 handle_replay_ready 2016-05-06 18:19:21.811634 7fd8ca5fa700 20 JournalPlayer: try_pop_front 2016-05-06 18:19:21.811640 7fd8ca5fa700 20 JournalPlayer: advance_splay_object: new offset 3 2016-05-06 18:19:21.811645 7fd8ca5fa700 20 JournalMetadata: allocated commit tid: commit_tid=1 [object_num=10, tag_tid=301, entry_tid=2] 2016-05-06 18:19:21.811809 7fd8f145a700 20 JournalPlayer: try_pop_front 2016-05-06 18:19:21.811813 7fd8f145a700 20 JournalPlayer: advance_splay_object: new offset 0 2016-05-06 18:19:21.811814 7fd8f145a700 20 JournalMetadata: allocated commit tid: commit_tid=2 [object_num=11, tag_tid=301, entry_tid=3] 2016-05-06 18:19:21.811881 7fd8f145a700 20 JournalPlayer: try_pop_front 2016-05-06 18:19:21.811884 7fd8f145a700 -1 JournalPlayer: missing prior journal entry: Entry[tag_tid=301, entry_tid=0, data size=10270] 2016-05-06 18:19:21.811887 7fd8f145a700 10 JournalPlayer: notify_complete: replay complete: r=-42
Related issues
History
#1 Updated by Jason Dillaman almost 8 years ago
- Description updated (diff)
#2 Updated by Jason Dillaman almost 8 years ago
- Subject changed from Imbalanced journal reply fails to start from correct position to Imbalanced journal replay fails to start from correct position
#3 Updated by Jason Dillaman almost 8 years ago
- Status changed from New to In Progress
- Assignee set to Jason Dillaman
#4 Updated by Jason Dillaman almost 8 years ago
- Status changed from In Progress to Fix Under Review
#5 Updated by Jason Dillaman almost 8 years ago
- Backport set to jewel
#6 Updated by Jason Dillaman almost 8 years ago
- Status changed from Fix Under Review to Pending Backport
#7 Updated by Jason Dillaman almost 8 years ago
- Copied to Backport #15817: jewel: Imbalanced journal replay fails to start from correct position added
#8 Updated by Jason Dillaman almost 8 years ago
- Status changed from Pending Backport to Resolved