Bug #14780
closedlibrbd: TaskFinisher lifetime no longer matches ImageWatcher
0%
Description
Since notify handling was made async from the librados threads in commit:d898995b0e3ea301b1325f68a0532d57afa3c816 tests can crash during image close when exclusive locking is enabled.
This occurs because flushing the watches no longer guarantees that all notifies have been completely handled, and since these are run from the TaskFinisher attached to the CephContext, notifies added to the TaskFinisher run after the ImageCtx they refer to has been destroyed. The notify for exclusive lock release runs into this in this case.
Looking into this also made me notice that sharing a single TaskFinisher is not safe currently since all events are cancelled by ImageWatcher::unregister_watch(), not just those scheduled by that image.
Example crash backtrace from test_rbd.py:
#0 ceph::log::SubsystemMap::should_gather (this=0x90, sub=15, level=10) at ./log/SubsystemMap.h:62 #1 0x00007fa1c523e2b8 in librados::IoCtxImpl::notify (this=0x2c3bec0, oid=..., bl=..., timeout_ms=<optimized out>, preply_bl=<optimized out>, preply_buf=<optimized out>, preply_buf_len=0x0) at librados/IoCtxImpl.cc:1332 #2 0x00007fa1c51f8447 in librados::IoCtx::notify2 (this=0x2c68360, oid=..., bl=..., timeout_ms=timeout_ms@entry=5000, preplybl=preplybl@entry=0x0) at librados/librados.cc:1827 #3 0x00007fa1ba2d0d6f in librbd::ImageWatcher::execute_released_lock (this=0x7fa16c010170) at librbd/ImageWatcher.cc:324 #4 0x00007fa1ba2d959a in operator() (a0=<optimized out>, this=<optimized out>) at /usr/include/boost/function/function_template.hpp:767 #5 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at ./include/Context.h:460 #6 0x00007fa1ba296ae7 in Context::complete (this=0x7fa16c003990, r=0) at ./include/Context.h:64 #7 0x00007fa1ba296ae7 in Context::complete (this=0x7fa16c0096c0, r=0) at ./include/Context.h:64 #8 0x00007fa1ba3a70a6 in Finisher::finisher_thread_entry (this=0x7fa16c0026c0) at common/Finisher.cc:68 #9 0x00000031a7a07f33 in start_thread (arg=0x7fa18dffb700) at pthread_create.c:309 #10 0x00000031a76f4ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111