Bug #15947
closedSporadic TestImageReplayer.NextTag failure
0%
Description
`ceph_test_rbd_mirror --gtest_filter=TestImageReplayer.NextTag` might fail with:
...
m_replayer->flush()
get_commit_positions: client: [id=, commit_position=[positions=[[object_number=2, tag_tid=11, entry_tid=10], [object_number=1, tag_tid=11, entry_tid=9], [object_number=0, tag_tid=11, entry_tid=8], [object_number=3, tag_tid=11, entry_tid=7]]], state=connected]
get_commit_positions: client: [id=local mirror uuid, commit_position=[positions=[[object_number=2, tag_tid=11, entry_tid=10], [object_number=1, tag_tid=11, entry_tid=9], [object_number=0, tag_tid=11, entry_tid=8], [object_number=3, tag_tid=11, entry_tid=7]]], state=connected]
read: 512
...read: 512
read:
expected: ...
test/rbd_mirror/test_ImageReplayer.cc:304: Failure
Value of: memcmp(result, expected, len)
Actual: -93
Expected: 0
Full test output and ceph client debug log are attached.
Files
Updated by Mykola Golub almost 8 years ago
When running under valgrind, it much harder to reproduce, still I observed one failure. There were no valgrind error reported then, but an assertion violation was triggered on the test finish:
read: 512
read:
expected: ...
test/rbd_mirror/test_ImageReplayer.cc:304: Failure
Value of: memcmp(result, expected, len)
Actual: -44
Expected: 0
read: 512
read:
expected: ...
test/rbd_mirror/test_ImageReplayer.cc:304: Failure
Value of: memcmp(result, expected, len)
Actual: -44
Expected: 0
...
read: 512
librbd/journal/Replay.cc: In function 'librbd::journal::Replay<ImageCtxT>::~Replay() [with ImageCtxT = librbd::ImageCtx]' thread 222a0700 time 2016-05-20 12:14:02.030028
librbd/journal/Replay.cc: 126: FAILED assert(m_aio_modify_safe_contexts.empty())
ceph version 10.2.0-1127-g99e7552 (99e7552c3a195f4e1128617fe8a904b30480293b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) [0x606f02]
2: (librbd::journal::Replay<librbd::ImageCtx>::~Replay()+0x171) [0x4c5381]
3: (librbd::Journal<librbd::ImageCtx>::stop_external_replay()+0x32) [0x47aec2]
4: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::on_stop_journal_replay_shut_down_finish(int)+0xa7) [0x3e31c7]
5: (FunctionContext::finish(int)+0x2c) [0x3dec3c]
6: (Context::complete(int)+0x9) [0x3c9659]
7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa9f) [0x60dcdf]
8: (ThreadPool::WorkThread::entry()+0x10) [0x60ec00]
9: (()+0x80a4) [0xaba60a4]
10: (clone()+0x6d) [0xc89104d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
26717
26717 HEAP SUMMARY:
26717 in use at exit: 1,363,771 bytes in 12,027 blocks
26717 total heap usage: 332,176 allocs, 320,149 frees, 48,169,981 bytes allocated
26717
26717 LEAK SUMMARY:
26717 definitely lost: 1,850 bytes in 3 blocks
26717 indirectly lost: 1,212 bytes in 1 blocks
26717 possibly lost: 258,604 bytes in 2,579 blocks
26717 still reachable: 1,102,105 bytes in 9,444 blocks
26717 suppressed: 0 bytes in 0 blocks
26717 Rerun with --leak-check=full to see details of leaked memory
26717
26717 For counts of detected and suppressed errors, rerun with: -v
26717 ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Updated by Josh Durgin almost 8 years ago
Another instance: https://jenkins.ceph.com/job/ceph-pull-requests/9411/console
/home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/journal/Replay.cc: In function 'librbd::journal::Replay<ImageCtxT>::~Replay() [with ImageCtxT = librbd::ImageCtx]' thread 7f5ceb252700 time 2016-07-21 05:20:05.569267 /home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/journal/Replay.cc: 156: FAILED assert(m_aio_modify_safe_contexts.empty()) ceph version v11.0.0-813-gbdc800b (bdc800b23d62dc939a34a88bc84dea0ddaac4484) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f5cf8d4621b] 2: (librbd::journal::Replay<librbd::ImageCtx>::~Replay()+0x1b8) [0x8801a8] 3: (librbd::Journal<librbd::ImageCtx>::stop_external_replay()+0x68) [0x8375b8] 4: (boost::detail::function::void_function_obj_invoker1<rbd::mirror::ImageReplayer<librbd::ImageCtx>::replay_flush()::{lambda(int)#1}, void, int>::invoke(boost::detail::function::function_buffer&, int)+0x21) [0x650051] 5: (FunctionContext::finish(int)+0x1a) [0x651afa] 6: (Context::complete(int)+0x9) [0x64ede9] get_commit_positions: client: [id=local mirror uuid, commit_position=[positions=[[object_number=3, tag_tid=2, entry_tid=7], [object_number=2, tag_tid=2, entry_tid=6], [object_number=1, tag_tid=2, entry_tid=5], [object_number=0, tag_tid=2, entry_tid=4]]], state=connected] m_replayer->flush() get_commit_positions: client: [id=, commit_position=[positions=[[object_number=2, tag_tid=11, entry_tid=10], [object_number=1, tag_tid=11, entry_tid=9], [object_number=0, tag_tid=11, entry_tid=8], [object_number=3, tag_tid=11, entry_tid=7]]], state=connected] get_commit_positions: client: [id=local mirror uuid, commit_position=[positions=[[object_number=3, tag_tid=2, entry_tid=7], [object_number=2, tag_tid=2, entry_tid=6], [object_number=1, tag_tid=2, entry_tid=5], [object_number=0, tag_tid=2, entry_tid=4]]], state=connected] 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb65) [0x94ac75] 8: (ThreadPool::WorkThread::entry()+0x10) [0x94bc30] 9: (()+0x8182) [0x7f5d014f3182] 10: (clone()+0x6d) [0x7f5cf75c44ad]
Updated by Jason Dillaman over 7 years ago
- Status changed from New to In Progress
- Assignee set to Jason Dillaman
- Backport set to jewel
Updated by Jason Dillaman over 7 years ago
- Status changed from In Progress to Resolved
Flagging as resolved by ticket #16708