Project

General

Profile

Actions

Bug #47880

closed

[journal] object recorder can race while lock is temporarily release for callbacks

Added by Jason Dillaman over 3 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Jason Dillaman
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The assertion for "ceph_assert(m_in_flight_callbacks)" can fail in "notify_handler_unlock" if two callbacks race. It's possible a flush request arrived while "handle_append_flushed" had dropped the lock for callbacks but before it could notify the handler.

#2  0x00007f46cf3b5877 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, 
    func=0x7f46dd5e3980 <journal::ObjectRecorder::notify_handler_unlock(std::unique_lock<std::mutex>&, bool)::__PRETTY_FUNCTION__> "void journal::ObjectRecorder::notify_handler_unlock(std::unique_lock<std::mutex>&, bool)")
    at /usr/src/debug/ceph-16.0.0-6312.gc1613349.el8.x86_64/src/common/assert.cc:75
#3  0x00007f46cf3b5a40 in ceph::__ceph_assert_fail (ctx=...) at /usr/src/debug/ceph-16.0.0-6312.gc1613349.el8.x86_64/src/common/assert.cc:80
#4  0x00007f46dd4acf7e in journal::ObjectRecorder::notify_handler_unlock (this=<optimized out>, locker=..., notify_overflowed=<optimized out>) at /usr/src/debug/ceph-16.0.0-6312.gc1613349.el8.x86_64/src/log/Entry.h:35
#5  0x00007f46dd4b2b69 in journal::ObjectRecorder::handle_append_flushed (this=0x7f4570002b10, tid=<optimized out>, r=<optimized out>) at /usr/src/debug/ceph-16.0.0-6312.gc1613349.el8.x86_64/src/journal/ObjectRecorder.cc:239
#6  0x00007f46dd4b3628 in Context::complete (r=<optimized out>, this=0x7f46883902b0) at /usr/src/debug/ceph-16.0.0-6312.gc1613349.el8.x86_64/src/include/Context.h:99

m_overflowed = true
m_object_closed = true
m_object_closed_notify = true <---- implies 'ObjectRecorder::closed()' was invoked while IO in-flight or callback in-flight
m_in_flight_callbacks = false
m_in_flight_tids = std::set with 1 element = {[0] = 231}


Related issues 3 (0 open3 closed)

Related to rbd - Bug #51100: object_recorder->is_closed() assert failure in JournalRecorder::open_object_set() in nautilusWon't Fix - EOL

Actions
Copied to rbd - Backport #47886: octopus: [journal] object recorder can race while lock is temporarily release for callbacksResolvedNathan CutlerActions
Copied to rbd - Backport #47887: nautilus: [journal] object recorder can race while lock is temporarily release for callbacksRejectedMykola GolubActions
Actions #1

Updated by Jason Dillaman over 3 years ago

  • Pull request ID set to 37699
Actions #2

Updated by Jason Dillaman over 3 years ago

  • Status changed from In Progress to Fix Under Review
Actions #3

Updated by Mykola Golub over 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47886: octopus: [journal] object recorder can race while lock is temporarily release for callbacks added
Actions #5

Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #47887: nautilus: [journal] object recorder can race while lock is temporarily release for callbacks added
Actions #6

Updated by Ilya Dryomov almost 3 years ago

  • Related to Bug #51100: object_recorder->is_closed() assert failure in JournalRecorder::open_object_set() in nautilus added
Actions #7

Updated by Ilya Dryomov about 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF