Project

General

Profile

Actions

Bug #16708

closed

Sporadic failure in TestImageReplayer.StartReplayAndWrite

Added by Jason Dillaman over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From Mykola:

Analyzing the log, it looks like the following happens in the test:

1) ImageReplayer::process_entry is called -> Replay::process ->
handle_event: AIO write event -> create_aio_modify_completion:
stores on_safe(1) in m_aio_modify_unsafe_contexts.

2) ImageReplayer::flush is called -> Replay::create_aio_flush_completion:
moves m_aio_modify_unsafe_contexts (with on_safe(1)) to
C_AioFlushComplete::on_safe_ctxs.

3) Replay::handle_aio_flush_complete (for 2) is called, and in
"strip out previously failed on_safe contexts" block, on_safe(1)
is removed from on_safe_ctxs, because it is not found in
m_aio_modify_safe_contexts (handle_aio_modify_complete is not
called yet to store on_safe(1) in this list).

4) Replay::handle_aio_modify_complete is called: on_safe(1) is
stored (forever) in m_aio_modify_safe_contexts.

The AioCompletion needs to be "started" before the librbd::journal::Replay lock is released within create_aio_modify_completion. This would prevent an out-of-band flush event from racing with the start of the op and corrupting the internal state.

This issue would only affect the unit tests and asok flush command.


Related issues 1 (0 open1 closed)

Copied to rbd - Backport #17088: jewel: Sporadic failure in TestImageReplayer.StartReplayAndWriteResolvedMykola GolubActions
Actions #2

Updated by Jason Dillaman over 7 years ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman
Actions #4

Updated by Jason Dillaman over 7 years ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to jewel
Actions #5

Updated by Mykola Golub over 7 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Loïc Dachary over 7 years ago

  • Copied to Backport #17088: jewel: Sporadic failure in TestImageReplayer.StartReplayAndWrite added
Actions #7

Updated by Loïc Dachary over 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF