Project

General

Profile

Actions

Bug #49226

closed

librbd: refuse to release exclusive lock when removing

Added by Ilya Dryomov about 3 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Commit [1] changed PreRemoveRequest to request exclusive lock from the peer instead of giving up and proceeding without exclusive lock. This caused one of the test cases that sometimes runs concurrent "rbd rm" against the same image to fail intermittently, most often on assert

template <typename I>
class C_RemoveObject : public C_AsyncObjectThrottle<I> {
public:
  C_RemoveObject(AsyncObjectThrottle<I> &throttle, I *image_ctx,
                 uint64_t object_no)
    : C_AsyncObjectThrottle<I>(throttle, *image_ctx), m_object_no(object_no)
  {
  }

  int send() override {
    I &image_ctx = this->m_image_ctx;
    ceph_assert(ceph_mutex_is_locked(image_ctx.owner_lock));
    ceph_assert(image_ctx.exclusive_lock == nullptr ||
                image_ctx.exclusive_lock->is_lock_owner());      <----

    {
      std::shared_lock image_locker{image_ctx.image_lock};
      if (image_ctx.object_map != nullptr &&
          !image_ctx.object_map->object_may_exist(m_object_no)) {
        return 1;
      }
    }

because exclusive lock is now automatically transitioned to another "rbd rm" on its request.

The root cause is older and probably goes back to when synchronous librbd::remove() which held owner_lock across all operations including trim_image() was converted to a set of state machines, starting in 2017 [2]. Since then, any peer that requests exclusive lock (instead of trying once and backing off) is able to mess with image removal.

[1] https://github.com/ceph/ceph/commit/25c2ffe145becf6e32dd88682673f9761ee62fa8
[2] https://github.com/ceph/ceph/commit/10a012f1dee91b781d85be5b5121b473e5e257ef


Related issues 2 (0 open2 closed)

Copied to rbd - Backport #49257: octopus: librbd: refuse to release exclusive lock when removingResolvedJason DillamanActions
Copied to rbd - Backport #49258: nautilus: librbd: refuse to release exclusive lock when removingRejectedActions
Actions #1

Updated by Ilya Dryomov about 3 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 39375
Actions #2

Updated by Ilya Dryomov about 3 years ago

  • Backport set to nautilus,octopus
Actions #3

Updated by Jason Dillaman about 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Backport Bot about 3 years ago

  • Copied to Backport #49257: octopus: librbd: refuse to release exclusive lock when removing added
Actions #5

Updated by Backport Bot about 3 years ago

  • Copied to Backport #49258: nautilus: librbd: refuse to release exclusive lock when removing added
Actions #6

Updated by Ilya Dryomov about 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF