Backport #36630
luminous: potential deadlock in PG::_scan_snaps when repairing snap mapper
Description
If during a pg scrub a snap mapper error is detected in PG::_scan_snaps, on repair `ObjectStore::apply_transactions` is called with the pg lock hold, which might lead to a deadlock: the apply_transactions is waiting for the callback, which is successfully queued to the filestore apply finisher, but can't be executed because the finisher is busy executing a callback that is waiting for the pg lock to releases. See more details on ceph-devel [1].
The master and mimic are not affected as it has been fixed during os,osd refactoring. To fix this in luminous we need to partially backport
907b6281e99ece3677dd7b012cf4955731db6120 ("apply_transaction -> queue_transaction" change for PG::_scan_snaps).
)
[1] https://www.spinics.net/lists/ceph-devel/msg43434.html
History
#1 Updated by Nathan Cutler over 5 years ago
- Release set to luminous
#2 Updated by Yuri Weinstein over 5 years ago
#3 Updated by Nathan Cutler over 5 years ago
- Status changed from In Progress to Resolved
- Target version set to v12.2.11