Blacklisted client might not notice it lost the lock
After blacklisting the lock owner, if after 30 seconds the blacklist is removed, the watch on the RBD image header should be marked as failed and librbd should be able to detect that the lock was lost when it attempts to re-acquire it. However, during an iSCSI test where IO was incorrectly sent to previously blacklisted lock owner, the IO improperly succeeded when it should have failed w/ -EROFS.
#1 Updated by Jason Dillaman about 1 year ago
(1) upon blacklist, the watcher doesn't attempt to re-acquire the lock but that leaves the lock in a "locked" state internally since it also doesn't attempt to reacquire the lock.
2018-08-30 18:01:09.090044 7fbae17fa700 -1 librbd::ImageWatcher: 0x146fb00 image watch failed: 21530400, (107) Transport endpoint is not connected 2018-08-30 18:01:09.090077 7fbae17fa700 -1 librbd::Watcher: 0x146fb00 handle_error: handle=21530400: (107) Transport endpoint is not connected 2018-08-30 18:01:09.090692 7fbae17fa700 -1 librbd::watcher::RewatchRequest: 0x7fbad0000f60 handle_unwatch client blacklisted 2018-08-30 18:01:09.090726 7fbae0ff9700 -1 librbd::ManagedLock: 0x14c5ca0 send_reacquire_lock: aborting reacquire due to invalid watch handle
(2) attempting to blacklist another peer while in this state will result in the lock_break API method failing w/ -EBUSY since it thinks it owns the lock and doesn't check if the blacklist target is itself:
2018-08-30 17:49:15.396290 7f7c9affd700 10 librbd::ManagedLock: 0x7f7c940588d0 break_lock 2018-08-30 17:49:15.396295 7f7c9affd700 20 librbd::ManagedLock: 0x7f7c940588d0 is_lock_owner=1 2018-08-30 17:49:15.396296 7f7c9affd700 -1 librbd: failed to break lock: (16) Device or resource busy