Project

General

Profile

Bug #15993

rbd-mirror can become stuck during live-replay

Added by Jason Dillaman almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
High
Target version:
-
Start date:
05/23/2016
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

2016-05-23 07:31:04.045929 7fa7251e1700 -1 JournalPlayer: missing prior journal entry: Entry[tag_tid=2, entry_tid=924, data size=16777183]
2016-05-23 07:31:04.045952 7fa7251e1700 -1 rbd-mirror: ImageReplayer[1/5e4e2ae8944a]::handle_replay_complete: replay encountered an error: (42) No message of desired type
2016-05-23 07:32:35.695730 7fa7251e1700 -1 JournalPlayer: missing prior journal entry: Entry[tag_tid=2, entry_tid=956, data size=1054]
2016-05-23 07:32:35.695751 7fa7251e1700 -1 rbd-mirror: ImageReplayer[1/5e4e2ae8944a]::handle_replay_complete: replay encountered an error: (42) No message of desired type
# rbd --cluster slave --pool pool1 mirror pool status --verbose
health: OK
images: 4 total
    4 replaying

data1:
  global_id:   037458d0-4516-4b68-908f-ab6fce7de7a7
  state:       up+replaying
  description: replaying, master_position=[object_number=287, tag_tid=2, entry_tid=1491], mirror_position=[object_number=150, tag_tid=2, entry_tid=966], entries_behind_master=525
  last_update: 2016-05-23 11:56:13

After restarting rbd-mirror, replication proceeded until it became stuck again:

    [id=, commit_position=[positions=[[object_number=287, tag_tid=2, entry_tid=1491], [object_number=286, tag_tid=2, entry_tid=1490], [object_number=285, tag_tid=2, entry_tid=1489], [object_number=276, tag_tid=2, entry_tid=1488]]], state=connected]
    [id=041ffd0b-8910-47c6-9275-6858c551739d, commit_position=[positions=[[object_number=286, tag_tid=2, entry_tid=1490], [object_number=285, tag_tid=2, entry_tid=1489], [object_number=276, tag_tid=2, entry_tid=1488], [object_number=287, tag_tid=2, entry_tid=1487]]], state=connected]
2016-05-23 12:14:49.563020 7f8a8affd700 20 JournalPlayer: schedule_watch: scheduling watch on journal_data.1.5e4e2ae8944a.284
2016-05-23 12:14:49.563022 7f8a8affd700 20 ObjectPlayer: watch: journal_data.1.5e4e2ae8944a.284 watch
2016-05-23 12:14:49.563023 7f8a8affd700 20 ObjectPlayer: schedule_watch: journal_data.1.5e4e2ae8944a.284 scheduling watch
2016-05-23 12:14:50.563093 7f8ab5d7b700 10 ObjectPlayer: handle_watch_task: journal_data.1.5e4e2ae8944a.284 polling
2016-05-23 12:14:50.563100 7f8ab5d7b700 10 ObjectPlayer: fetch: journal_data.1.5e4e2ae8944a.284
2016-05-23 12:14:50.563797 7f8a8affd700 10 ObjectPlayer: handle_fetch_complete: journal_data.1.5e4e2ae8944a.284, r=-2, len=0
2016-05-23 12:14:50.563802 7f8a8affd700 10 ObjectPlayer: handle_watch_fetched: journal_data.1.5e4e2ae8944a.284 poll complete, r=-2
2016-05-23 12:14:50.564392 7f8a8affd700 20 JournalPlayer: schedule_watch: scheduling watch on journal_data.1.5e4e2ae8944a.284

Again, restarting rbd-mirror fixed the issue.


Related issues

Copied to rbd - Backport #16020: jewel: rbd-mirror can become stuck during live-replay Resolved

History

#1 Updated by Jason Dillaman almost 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman

#2 Updated by Jason Dillaman almost 3 years ago

  • Status changed from In Progress to Need Review
  • Backport set to jewel

#3 Updated by Jason Dillaman almost 3 years ago

  • Status changed from Need Review to Pending Backport

#4 Updated by Jason Dillaman almost 3 years ago

  • Copied to Backport #16020: jewel: rbd-mirror can become stuck during live-replay added

#5 Updated by Jason Dillaman almost 3 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF