Bug #36631
openpotential deadlock in PG::_scan_snaps when repairing snap mapper
0%
Description
If during a pg scrub a snap mapper error is detected in PG::_scan_snaps, on repair `ObjectStore::apply_transactions` is called with the pg lock hold, which might lead to a deadlock: the apply_transactions is waiting for the callback, which is successfully queued to the filestore apply finisher, but can't be executed because the finisher is busy executing a callback that is waiting for the pg lock to releases. See more details on ceph-devel [1].
The master and mimic are not affected as it has been fixed during os,osd refactoring. To fix this in jewel we need to partially backport
907b6281e99ece3677dd7b012cf4955731db6120 ("apply_transaction -> queue_transaction" change for PG::_scan_snaps) and ea531df216c65a3773a55b5f6c42af20e5004263 (wait for repair to apply).