Project

General

Profile

Bug #34534

Blacklisted client might not notice it lost the lock

Added by Jason Dillaman 3 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
High
Target version:
-
Start date:
08/30/2018
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

After blacklisting the lock owner, if after 30 seconds the blacklist is removed, the watch on the RBD image header should be marked as failed and librbd should be able to detect that the lock was lost when it attempts to re-acquire it. However, during an iSCSI test where IO was incorrectly sent to previously blacklisted lock owner, the IO improperly succeeded when it should have failed w/ -EROFS.


Related issues

Copied to rbd - Backport #36143: luminous: Blacklisted client might not notice it lost the lock Resolved
Copied to rbd - Backport #36144: mimic: Blacklisted client might not notice it lost the lock Resolved

History

#1 Updated by Jason Dillaman 3 months ago

Couple bugs:

(1) upon blacklist, the watcher doesn't attempt to re-acquire the lock but that leaves the lock in a "locked" state internally since it also doesn't attempt to reacquire the lock.

2018-08-30 18:01:09.090044 7fbae17fa700 -1 librbd::ImageWatcher: 0x146fb00 image watch failed: 21530400, (107) Transport endpoint is not connected
2018-08-30 18:01:09.090077 7fbae17fa700 -1 librbd::Watcher: 0x146fb00 handle_error: handle=21530400: (107) Transport endpoint is not connected
2018-08-30 18:01:09.090692 7fbae17fa700 -1 librbd::watcher::RewatchRequest: 0x7fbad0000f60 handle_unwatch client blacklisted
2018-08-30 18:01:09.090726 7fbae0ff9700 -1 librbd::ManagedLock: 0x14c5ca0 send_reacquire_lock: aborting reacquire due to invalid watch handle

(2) attempting to blacklist another peer while in this state will result in the lock_break API method failing w/ -EBUSY since it thinks it owns the lock and doesn't check if the blacklist target is itself:

2018-08-30 17:49:15.396290 7f7c9affd700 10 librbd::ManagedLock: 0x7f7c940588d0 break_lock
2018-08-30 17:49:15.396295 7f7c9affd700 20 librbd::ManagedLock: 0x7f7c940588d0 is_lock_owner=1
2018-08-30 17:49:15.396296 7f7c9affd700 -1 librbd: failed to break lock: (16) Device or resource busy

#2 Updated by Jason Dillaman 3 months ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman

#3 Updated by Jason Dillaman 3 months ago

  • Backport set to luminous,mimic

#5 Updated by Mykola Golub 3 months ago

  • Status changed from In Progress to Need Review

#6 Updated by Mykola Golub 3 months ago

  • Status changed from Need Review to Pending Backport

#7 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #36143: luminous: Blacklisted client might not notice it lost the lock added

#8 Updated by Nathan Cutler 3 months ago

  • Copied to Backport #36144: mimic: Blacklisted client might not notice it lost the lock added

#9 Updated by Nathan Cutler about 2 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF