Bug #49226
closedlibrbd: refuse to release exclusive lock when removing
0%
Description
Commit [1] changed PreRemoveRequest to request exclusive lock from the peer instead of giving up and proceeding without exclusive lock. This caused one of the test cases that sometimes runs concurrent "rbd rm" against the same image to fail intermittently, most often on assert
template <typename I> class C_RemoveObject : public C_AsyncObjectThrottle<I> { public: C_RemoveObject(AsyncObjectThrottle<I> &throttle, I *image_ctx, uint64_t object_no) : C_AsyncObjectThrottle<I>(throttle, *image_ctx), m_object_no(object_no) { } int send() override { I &image_ctx = this->m_image_ctx; ceph_assert(ceph_mutex_is_locked(image_ctx.owner_lock)); ceph_assert(image_ctx.exclusive_lock == nullptr || image_ctx.exclusive_lock->is_lock_owner()); <---- { std::shared_lock image_locker{image_ctx.image_lock}; if (image_ctx.object_map != nullptr && !image_ctx.object_map->object_may_exist(m_object_no)) { return 1; } }
because exclusive lock is now automatically transitioned to another "rbd rm" on its request.
The root cause is older and probably goes back to when synchronous librbd::remove() which held owner_lock across all operations including trim_image() was converted to a set of state machines, starting in 2017 [2]. Since then, any peer that requests exclusive lock (instead of trying once and backing off) is able to mess with image removal.
[1] https://github.com/ceph/ceph/commit/25c2ffe145becf6e32dd88682673f9761ee62fa8
[2] https://github.com/ceph/ceph/commit/10a012f1dee91b781d85be5b5121b473e5e257ef