Project

General

Profile

Bug #14434

intermittent errors in make check tests from journaling code

Added by Greg Farnum almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Urgent
Target version:
-
Start date:
01/20/2016
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

One example: http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-tarball-trusty-amd64-basic/log.cgi?log=f3e58a054aed197bf4f872c0af7b003f80e0ea2f

[ RUN ] TestMockObjectMapSnapshotRollbackRequest.ReadMapError
./librbd/Journal.cc: In function 'void librbd::Journal< >::handle_op_event_safe(int, uint64_t, const Future&, Context*) [with ImageCtxT = librbd::ImageCtx; uint64_t = long unsigned int; librbd::Journal< >::Future = journal::Future]' thread 2af1f6066700 time 2016-01-20 03:54:27.306517
./librbd/Journal.cc: 806: FAILED assert(m_state == STATE_READY || m_state == STATE_STOPPING)
ceph version 10.0.2-961-gf3e58a0 (f3e58a054aed197bf4f872c0af7b003f80e0ea2f)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8d) [0x2af1d82932cd]
2: (librbd::Journal::handle_op_event_safe(int, unsigned long, journal::Future const&, Context*)+0x2d2) [0x2af1d7f6f5b2]
3: (Context::complete(int)+0x11) [0x2af1d7f01821]
4: (journal::FutureImpl::finish_unlock()+0x79) [0x2af1d81c2f69]
5: (journal::FutureImpl::C_ConsistentAck::complete(int)+0x1a) [0x2af1d81c3c8a]
6: (Finisher::finisher_thread_entry()+0x243) [0x2af1d82726f3]
7: (()+0x8182) [0x2af1eb08d182]
8: (clone()+0x6d) [0x2af1ec95a47d]
NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this.
./test/run-rbd-unit-tests.sh: line 10: 31959 Aborted RBD_FEATURES=$i unittest_librbd

Associated revisions

Revision 14426213 (diff)
Added by Jason Dillaman almost 3 years ago

journal: avoid race between in-flight notifications and flush

If an async callback for a safely committed event was in-flight,
it could race with the flush of the journal. This would result
in the flush callback completing before the notifications for
safe events.

Fixes: #14434
Signed-off-by: Jason Dillaman <>

History

#1 Updated by Jason Dillaman almost 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman

#2 Updated by Jason Dillaman almost 3 years ago

  • Status changed from In Progress to Need Review

#3 Updated by Jason Dillaman almost 3 years ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF