Project

General

Profile

Backport #36630

luminous: potential deadlock in PG::_scan_snaps when repairing snap mapper

Added by Mykola Golub 6 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Release:
luminous

Description

If during a pg scrub a snap mapper error is detected in PG::_scan_snaps, on repair `ObjectStore::apply_transactions` is called with the pg lock hold, which might lead to a deadlock: the apply_transactions is waiting for the callback, which is successfully queued to the filestore apply finisher, but can't be executed because the finisher is busy executing a callback that is waiting for the pg lock to releases. See more details on ceph-devel [1].

The master and mimic are not affected as it has been fixed during os,osd refactoring. To fix this in luminous we need to partially backport
907b6281e99ece3677dd7b012cf4955731db6120 ("apply_transaction -> queue_transaction" change for PG::_scan_snaps).
)
[1] https://www.spinics.net/lists/ceph-devel/msg43434.html

History

#1 Updated by Nathan Cutler 6 months ago

  • Release set to luminous

#3 Updated by Nathan Cutler 5 months ago

  • Status changed from In Progress to Resolved
  • Target version set to v12.2.11

Also available in: Atom PDF