Project

General

Profile

Actions

Bug #36631

open

potential deadlock in PG::_scan_snaps when repairing snap mapper

Added by Mykola Golub over 5 years ago. Updated over 4 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If during a pg scrub a snap mapper error is detected in PG::_scan_snaps, on repair `ObjectStore::apply_transactions` is called with the pg lock hold, which might lead to a deadlock: the apply_transactions is waiting for the callback, which is successfully queued to the filestore apply finisher, but can't be executed because the finisher is busy executing a callback that is waiting for the pg lock to releases. See more details on ceph-devel [1].

The master and mimic are not affected as it has been fixed during os,osd refactoring. To fix this in jewel we need to partially backport
907b6281e99ece3677dd7b012cf4955731db6120 ("apply_transaction -> queue_transaction" change for PG::_scan_snaps) and ea531df216c65a3773a55b5f6c42af20e5004263 (wait for repair to apply).

[1] https://www.spinics.net/lists/ceph-devel/msg43434.html

Actions

Also available in: Atom PDF