Project

General

Profile

Bug #11743

Possible crash while concurrently writing and shrinking an image

Added by Jason Dillaman about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
High
Target version:
-
Start date:
05/22/2015
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:

Description

An image shrink will result in a flush and cache invalidation. If there is a pending write in the cache, the flush will complete while still holding the cache lock (since all cache callbacks are with the cache lock held). Therefore, when attempting to invalidate the cache, the cache lock will be recursively locked.

#0  0x00007fd9f734affb in raise () from /lib64/libpthread.so.0
#1  0x00000000005a07bd in reraise_fatal (signum=6) at global/signal_handler.cc:59
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:109
#3  <signal handler called>
#4  0x00007fd9f63845d7 in raise () from /lib64/libc.so.6
#5  0x00007fd9f6385cc8 in abort () from /lib64/libc.so.6
#6  0x00007fd9f6c889b5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#7  0x00007fd9f6c86926 in ?? () from /lib64/libstdc++.so.6
#8  0x00007fd9f6c86953 in std::terminate() () from /lib64/libstdc++.so.6
#9  0x00007fd9f6c86b73 in __cxa_throw () from /lib64/libstdc++.so.6
#10 0x00007fd9fa5af80a in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=95, 
    func=0x7fd9fa866420 <Mutex::Lock(bool)::__PRETTY_FUNCTION__> "void Mutex::Lock(bool)") at common/assert.cc:77
#11 0x00007fd9fa561d15 in Mutex::Lock (this=0x2a005d8, no_lockdep=<optimized out>) at common/Mutex.cc:95
#12 0x00007fd9fa4a4bad in librbd::ImageCtx::invalidate_cache (this=0x2a00380, on_finish=0x7fd9ac000bb0) at librbd/ImageCtx.cc:684
#13 0x00007fd9fa49a7cc in librbd::AsyncResizeRequest::send_invalidate_cache (this=this@entry=0x7fd9c4018d40)
    at librbd/AsyncResizeRequest.cc:173
#14 0x00007fd9fa49c1ac in librbd::AsyncResizeRequest::should_complete (this=0x7fd9c4018d40, r=<optimized out>)
    at librbd/AsyncResizeRequest.cc:79
#15 0x00007fd9fa49830c in librbd::AsyncRequest::complete (this=0x7fd9c4018d40, r=0) at librbd/AsyncRequest.h:25
#16 0x00007fd9fa495bfa in operator() (a0=<optimized out>, this=<optimized out>) at /usr/include/boost/function/function_template.hpp:767
#17 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at include/Context.h:461
#18 0x00007fd9fa490149 in Context::complete (this=0x7fd9c4007ad0, r=<optimized out>) at include/Context.h:65
#19 0x00007fd9fa4998dc in librbd::AsyncOperation::finish_op (this=this@entry=0x3c3af40) at librbd/AsyncOperation.cc:47
#20 0x00007fd9fa48ec69 in librbd::AioCompletion::complete (this=this@entry=0x3c3ae10) at librbd/AioCompletion.cc:86
#21 0x00007fd9fa48fafa in librbd::AioCompletion::complete_request (this=0x3c3ae10, cct=0x29d3c00, r=0) at librbd/AioCompletion.cc:112
#22 0x00007fd9fa490149 in Context::complete (this=0x35f85d0, r=<optimized out>) at include/Context.h:65
#23 0x00007fd9fa7f0006 in ObjectCacher::C_WaitForWrite::finish (this=0x3c727c0, r=0) at osdc/ObjectCacher.cc:1423
#24 0x00007fd9fa490149 in Context::complete (this=0x3c727c0, r=<optimized out>) at include/Context.h:65
#25 0x00007fd9fa5040e8 in Finisher::finisher_thread_entry (this=0x2a04ad8) at common/Finisher.cc:59
#26 0x00007fd9f7343df5 in start_thread () from /lib64/libpthread.so.0
#27 0x00007fd9f64451ad in clone () from /lib64/libc.so.6

Related issues

Copied to rbd - Backport #12236: Possible crash while concurrently writing and shrinking an image Resolved 05/22/2015

Associated revisions

Revision 726d699b (diff)
Added by Jason Dillaman about 3 years ago

librbd: invalidate cache outside cache callback context

When shrinking an image, it's possible that the op flush callback
will be from within the cache callback context. This would result
in a deadlock when attempting to re-lock the cache lock in order to
invalidate the cache.

Fixes: #11743
Backport: hammer
Signed-off-by: Jason Dillaman <>

Revision d4eb7bd6 (diff)
Added by Jason Dillaman about 3 years ago

librbd: invalidate cache outside cache callback context

When shrinking an image, it's possible that the op flush callback
will be from within the cache callback context. This would result
in a deadlock when attempting to re-lock the cache lock in order to
invalidate the cache.

Fixes: #11743
Backport: hammer
Signed-off-by: Jason Dillaman <>
(cherry picked from commit 726d699b7790c7e371279281ab32cd3aeb8ece7b)

History

#1 Updated by Josh Durgin about 3 years ago

  • Priority changed from Normal to High

#2 Updated by Jason Dillaman about 3 years ago

  • Status changed from New to In Progress

#3 Updated by Jason Dillaman about 3 years ago

  • Assignee set to Jason Dillaman

#4 Updated by Jason Dillaman about 3 years ago

  • Status changed from In Progress to Need Review

#5 Updated by Jason Dillaman about 3 years ago

  • Status changed from Need Review to Pending Backport

#6 Updated by Loic Dachary almost 3 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF