Actions
Bug #16708
closedSporadic failure in TestImageReplayer.StartReplayAndWrite
Status:
Resolved
Priority:
Normal
Assignee:
Jason Dillaman
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
From Mykola:
Analyzing the log, it looks like the following happens in the test: 1) ImageReplayer::process_entry is called -> Replay::process -> handle_event: AIO write event -> create_aio_modify_completion: stores on_safe(1) in m_aio_modify_unsafe_contexts. 2) ImageReplayer::flush is called -> Replay::create_aio_flush_completion: moves m_aio_modify_unsafe_contexts (with on_safe(1)) to C_AioFlushComplete::on_safe_ctxs. 3) Replay::handle_aio_flush_complete (for 2) is called, and in "strip out previously failed on_safe contexts" block, on_safe(1) is removed from on_safe_ctxs, because it is not found in m_aio_modify_safe_contexts (handle_aio_modify_complete is not called yet to store on_safe(1) in this list). 4) Replay::handle_aio_modify_complete is called: on_safe(1) is stored (forever) in m_aio_modify_safe_contexts.
The AioCompletion needs to be "started" before the librbd::journal::Replay lock is released within create_aio_modify_completion. This would prevent an out-of-band flush event from racing with the start of the op and corrupting the internal state.
This issue would only affect the unit tests and asok flush command.
Updated by Kefu Chai over 7 years ago
Updated by Jason Dillaman over 7 years ago
- Status changed from New to In Progress
- Assignee set to Jason Dillaman
Updated by Jason Dillaman over 7 years ago
- Status changed from In Progress to Fix Under Review
- Backport set to jewel
Updated by Mykola Golub over 7 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Loïc Dachary over 7 years ago
- Copied to Backport #17088: jewel: Sporadic failure in TestImageReplayer.StartReplayAndWrite added
Updated by Loïc Dachary over 7 years ago
- Status changed from Pending Backport to Resolved
Actions