Bug #34534: Blacklisted client might not notice it lost the lock - rbd - Ceph

Actions

Copy link

Bug #34534

closed

Blacklisted client might not notice it lost the lock

Added by Jason Dillaman over 5 years ago. Updated over 5 years ago.

Status:

Resolved

Priority:

High

Assignee:

Jason Dillaman

Target version:

% Done:

Source:

Tags:

Backport:

luminous,mimic

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

After blacklisting the lock owner, if after 30 seconds the blacklist is removed, the watch on the RBD image header should be marked as failed and librbd should be able to detect that the lock was lost when it attempts to re-acquire it. However, during an iSCSI test where IO was incorrectly sent to previously blacklisted lock owner, the IO improperly succeeded when it should have failed w/ -EROFS.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Jason Dillaman over 5 years ago

Couple bugs:

(1) upon blacklist, the watcher doesn't attempt to re-acquire the lock but that leaves the lock in a "locked" state internally since it also doesn't attempt to reacquire the lock.

2018-08-30 18:01:09.090044 7fbae17fa700 -1 librbd::ImageWatcher: 0x146fb00 image watch failed: 21530400, (107) Transport endpoint is not connected
2018-08-30 18:01:09.090077 7fbae17fa700 -1 librbd::Watcher: 0x146fb00 handle_error: handle=21530400: (107) Transport endpoint is not connected
2018-08-30 18:01:09.090692 7fbae17fa700 -1 librbd::watcher::RewatchRequest: 0x7fbad0000f60 handle_unwatch client blacklisted
2018-08-30 18:01:09.090726 7fbae0ff9700 -1 librbd::ManagedLock: 0x14c5ca0 send_reacquire_lock: aborting reacquire due to invalid watch handle

(2) attempting to blacklist another peer while in this state will result in the lock_break API method failing w/ -EBUSY since it thinks it owns the lock and doesn't check if the blacklist target is itself:

2018-08-30 17:49:15.396290 7f7c9affd700 10 librbd::ManagedLock: 0x7f7c940588d0 break_lock
2018-08-30 17:49:15.396295 7f7c9affd700 20 librbd::ManagedLock: 0x7f7c940588d0 is_lock_owner=1
2018-08-30 17:49:15.396296 7f7c9affd700 -1 librbd: failed to break lock: (16) Device or resource busy

Actions

Copy link