Project

General

Profile

Actions

Bug #15947

closed

Sporadic TestImageReplayer.NextTag failure

Added by Mykola Golub almost 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

`ceph_test_rbd_mirror --gtest_filter=TestImageReplayer.NextTag` might fail with:

...
m_replayer->flush()
get_commit_positions: client: [id=, commit_position=[positions=[[object_number=2, tag_tid=11, entry_tid=10], [object_number=1, tag_tid=11, entry_tid=9], [object_number=0, tag_tid=11, entry_tid=8], [object_number=3, tag_tid=11, entry_tid=7]]], state=connected]
get_commit_positions: client: [id=local mirror uuid, commit_position=[positions=[[object_number=2, tag_tid=11, entry_tid=10], [object_number=1, tag_tid=11, entry_tid=9], [object_number=0, tag_tid=11, entry_tid=8], [object_number=3, tag_tid=11, entry_tid=7]]], state=connected]
read: 512
...read: 512
read:
expected: ...
test/rbd_mirror/test_ImageReplayer.cc:304: Failure
Value of: memcmp(result, expected, len)
Actual: -93
Expected: 0

Full test output and ceph client debug log are attached.


Files

TestImageReplayer.NextTag.log (15.9 KB) TestImageReplayer.NextTag.log failed test output Mykola Golub, 05/20/2016 09:15 AM
client.admin.25042.log.gz (152 KB) client.admin.25042.log.gz ceph client debug log Mykola Golub, 05/20/2016 09:18 AM
Actions #1

Updated by Mykola Golub almost 8 years ago

When running under valgrind, it much harder to reproduce, still I observed one failure. There were no valgrind error reported then, but an assertion violation was triggered on the test finish:

read: 512
read:
expected: ...
test/rbd_mirror/test_ImageReplayer.cc:304: Failure
Value of: memcmp(result, expected, len)
Actual: -44
Expected: 0
read: 512
read:
expected: ...
test/rbd_mirror/test_ImageReplayer.cc:304: Failure
Value of: memcmp(result, expected, len)
Actual: -44
Expected: 0
...
read: 512
librbd/journal/Replay.cc: In function 'librbd::journal::Replay<ImageCtxT>::~Replay() [with ImageCtxT = librbd::ImageCtx]' thread 222a0700 time 2016-05-20 12:14:02.030028
librbd/journal/Replay.cc: 126: FAILED assert(m_aio_modify_safe_contexts.empty())
ceph version 10.2.0-1127-g99e7552 (99e7552c3a195f4e1128617fe8a904b30480293b)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) [0x606f02]
2: (librbd::journal::Replay<librbd::ImageCtx>::~Replay()+0x171) [0x4c5381]
3: (librbd::Journal<librbd::ImageCtx>::stop_external_replay()+0x32) [0x47aec2]
4: (rbd::mirror::ImageReplayer<librbd::ImageCtx>::on_stop_journal_replay_shut_down_finish(int)+0xa7) [0x3e31c7]
5: (FunctionContext::finish(int)+0x2c) [0x3dec3c]
6: (Context::complete(int)+0x9) [0x3c9659]
7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa9f) [0x60dcdf]
8: (ThreadPool::WorkThread::entry()+0x10) [0x60ec00]
9: (()+0x80a4) [0xaba60a4]
10: (clone()+0x6d) [0xc89104d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
26717
26717 HEAP SUMMARY:
26717 in use at exit: 1,363,771 bytes in 12,027 blocks
26717 total heap usage: 332,176 allocs, 320,149 frees, 48,169,981 bytes allocated
26717
26717 LEAK SUMMARY:
26717 definitely lost: 1,850 bytes in 3 blocks
26717 indirectly lost: 1,212 bytes in 1 blocks
26717 possibly lost: 258,604 bytes in 2,579 blocks
26717 still reachable: 1,102,105 bytes in 9,444 blocks
26717 suppressed: 0 bytes in 0 blocks
26717 Rerun with --leak-check=full to see details of leaked memory
26717
26717 For counts of detected and suppressed errors, rerun with: -v
26717 ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Actions #2

Updated by Josh Durgin almost 8 years ago

Another instance: https://jenkins.ceph.com/job/ceph-pull-requests/9411/console

/home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/journal/Replay.cc: In function 'librbd::journal::Replay<ImageCtxT>::~Replay() [with ImageCtxT = librbd::ImageCtx]' thread 7f5ceb252700 time 2016-07-21 05:20:05.569267
/home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/journal/Replay.cc: 156: FAILED assert(m_aio_modify_safe_contexts.empty())
 ceph version v11.0.0-813-gbdc800b (bdc800b23d62dc939a34a88bc84dea0ddaac4484)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f5cf8d4621b]
 2: (librbd::journal::Replay<librbd::ImageCtx>::~Replay()+0x1b8) [0x8801a8]
 3: (librbd::Journal<librbd::ImageCtx>::stop_external_replay()+0x68) [0x8375b8]
 4: (boost::detail::function::void_function_obj_invoker1<rbd::mirror::ImageReplayer<librbd::ImageCtx>::replay_flush()::{lambda(int)#1}, void, int>::invoke(boost::detail::function::function_buffer&, int)+0x21) [0x650051]
 5: (FunctionContext::finish(int)+0x1a) [0x651afa]
 6: (Context::complete(int)+0x9) [0x64ede9]
get_commit_positions: client: [id=local mirror uuid, commit_position=[positions=[[object_number=3, tag_tid=2, entry_tid=7], [object_number=2, tag_tid=2, entry_tid=6], [object_number=1, tag_tid=2, entry_tid=5], [object_number=0, tag_tid=2, entry_tid=4]]], state=connected]
m_replayer->flush()
get_commit_positions: client: [id=, commit_position=[positions=[[object_number=2, tag_tid=11, entry_tid=10], [object_number=1, tag_tid=11, entry_tid=9], [object_number=0, tag_tid=11, entry_tid=8], [object_number=3, tag_tid=11, entry_tid=7]]], state=connected]
get_commit_positions: client: [id=local mirror uuid, commit_position=[positions=[[object_number=3, tag_tid=2, entry_tid=7], [object_number=2, tag_tid=2, entry_tid=6], [object_number=1, tag_tid=2, entry_tid=5], [object_number=0, tag_tid=2, entry_tid=4]]], state=connected]
 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb65) [0x94ac75]
 8: (ThreadPool::WorkThread::entry()+0x10) [0x94bc30]
 9: (()+0x8182) [0x7f5d014f3182]
 10: (clone()+0x6d) [0x7f5cf75c44ad]
Actions #3

Updated by Jason Dillaman over 7 years ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman
  • Backport set to jewel
Actions #4

Updated by Jason Dillaman over 7 years ago

  • Status changed from In Progress to Resolved

Flagging as resolved by ticket #16708

Actions

Also available in: Atom PDF