Project

General

Profile

Actions

Bug #63422

closed

librbd crash in journal discard wait_event

Added by Joshua Baergen 7 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
reef,quincy,pacific
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When enabling journal-based mirroring for 4 volumes, 2 VMs crashed in librbd (mix of original backtrace and translated backtrace below):

/home/ceph-build/ceph-build/Ubuntu/WORKDIR/ceph-16.2.13-133-ga5a112d/src/librbd/Journal.cc: In function 'librbd::Journal<ImageCtxT>::Future librbd::Journal<ImageCtxT>::wait_event(ceph::mutex&, uint64_t, Context*) [with ImageCtxT = librbd::ImageCtx; librbd::Journal<ImageCtxT>::Future = journal::Future; ceph::mutex = std::mutex; uint64_t = long unsigned int]' thread 7f9f2effd700 time 2023-11-02T15:52:25.213408+0000
/home/ceph-build/ceph-build/Ubuntu/WORKDIR/ceph-16.2.13-133-ga5a112d/src/librbd/Journal.cc: 1019: FAILED ceph_assert(it != m_events.end())
 ceph version 16.2.13-133-ga5a112d (a5a112dc153167aadfd9bda69fde6cb6531dfb63) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f9f37650d85]
 2: /usr/lib/ceph/libceph-common.so.2(+0x267f8d) [0x7f9f37650f8d]

librbd::Journal<librbd::ImageCtx>::wait_event(std::mutex&, unsigned long, Context*)
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/Journal.cc:1019 (discriminator 1)

librbd::Journal<librbd::ImageCtx>::wait_event(unsigned long, Context*)
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/Journal.cc:1009 (discriminator 1)

librbd::journal::ObjectDispatch<librbd::ImageCtx>::wait_or_flush_event(unsigned long, int, Context*)
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/journal/ObjectDispatch.cc:253

librbd::journal::ObjectDispatch<librbd::ImageCtx>::discard(unsigned long, unsigned long, unsigned long, std::shared_ptr<neorados::IOContext>, int, ZTracer::Trace const&, int*, unsigned long*, librbd::io::DispatchResult*, Context**, Context*)
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/journal/ObjectDispatch.cc:105

librbd::io::ObjectDispatcher<librbd::ImageCtx>::SendVisitor::operator()(librbd::io::ObjectDispatchSpec::DiscardRequest&) const
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/io/ObjectDispatcher.cc:58

librbd::io::Dispatcher<librbd::ImageCtx, librbd::io::ObjectDispatcherInterface>::send(librbd::io::ObjectDispatchSpec*)
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/io/Dispatcher.h:130

librbd::io::ObjectDispatchSpec::C_Dispatcher::complete(int)
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/io/ObjectDispatchSpec.cc:21

librbd::cache::WriteAroundObjectDispatch<librbd::ImageCtx>::handle_in_flight_io_complete(int, unsigned long, unsigned long, unsigned long, unsigned long)
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/cache/WriteAroundObjectDispatch.cc:416

Context::complete(int)
./obj-x86_64-linux-gnu/src/librbd/./src/include/Context.h:100

librbd::journal::(anonymous namespace)::C_CommitIOEvent<librbd::ImageCtx>::finish(int)
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/journal/ObjectDispatch.cc:66

Context::complete(int)
./obj-x86_64-linux-gnu/src/librbd/./src/include/Context.h:100

Context::complete(int)
./obj-x86_64-linux-gnu/src/librbd/./src/include/Context.h:100

librbd::io::SimpleSchedulerObjectDispatch<librbd::ImageCtx>::register_in_flight_request(unsigned long, utime_t const&, Context**)::{lambda(int)#1}::operator()(int) const
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/io/SimpleSchedulerObjectDispatch.cc:445

Context::complete(int)
./obj-x86_64-linux-gnu/src/librbd/./src/include/Context.h:100

librbd::io::ObjectDispatchSpec::C_Dispatcher::finish(int)
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/io/ObjectDispatchSpec.cc:34

librbd::io::ObjectDispatchSpec::C_Dispatcher::complete(int)
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/io/ObjectDispatchSpec.cc:30

librbd::io::ObjectRequest<librbd::ImageCtx>::finish(int)
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/io/ObjectRequest.cc:195

librbd::io::AbstractObjectWriteRequest<librbd::ImageCtx>::post_write_object_map_update()
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/io/ObjectRequest.cc:634

librbd::io::AbstractObjectWriteRequest<librbd::ImageCtx>::handle_write_object(int)
./obj-x86_64-linux-gnu/src/librbd/./src/librbd/io/ObjectRequest.cc:554

_ZN5boost4asio6detail11executor_opIN4ceph5async17ForwardingHandlerINS4_17CompletionHandlerIZN6librbd4asio4util20get_callback_adapterIZNS7_2io26AbstractObjectWriteRequestINS7_8ImageCtxEE12write_objectEvEUliE2_EEDaOT_EUlNS_6system10error_codeEDpOT_E_St5tupleIJSJ_EEEEEESaINS4_6detail14CompletionImplINS0_10io_context13executor_typeESN_vJSJ_EEEENS1_19scheduler_operationEE11do_completeEPvPSY_RKSJ_m
./obj-x86_64-linux-gnu/src/librbd/./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/executor_op.hpp:73

This is a custom build of 16.2.13 with some minor patches from us and some backports from 16.2.14. We don't have much logging enabled VM-side so nothing of interest to report from there.


Files

librbd.log (14.9 KB) librbd.log Joshua Baergen, 11/03/2023 08:35 PM

Related issues 3 (0 open3 closed)

Copied to rbd - Backport #63745: pacific: librbd crash in journal discard wait_eventResolvedJoshua BaergenActions
Copied to rbd - Backport #63746: reef: librbd crash in journal discard wait_eventResolvedJoshua BaergenActions
Copied to rbd - Backport #63747: quincy: librbd crash in journal discard wait_eventResolvedJoshua BaergenActions
Actions

Also available in: Atom PDF