Project

General

Profile

Bug #44207

mgr/volumes: deadlock when trying to purge large number of trash entries

Added by Venky Shankar about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
nautilus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

There's a subtle deadlock when purge tasks (via the generic async job machinery) tries to fetch the next job to execute. The volume (filesystem) should be opened in lockless mode since the main thread (command dispatcher thread) serializes access to the volume.

Hit this once when trying to remove large number of trash entries.


Related issues

Related to CephFS - Bug #44276: pybind/mgr/volumes: cleanup stale connection hang Resolved
Copied to CephFS - Backport #44282: nautilus: mgr/volumes: deadlock when trying to purge large number of trash entries Resolved

History

#1 Updated by Venky Shankar about 4 years ago

  • Status changed from New to In Progress

#2 Updated by Venky Shankar about 4 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 33413

#3 Updated by Patrick Donnelly about 4 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Target version set to v15.0.0

#4 Updated by Patrick Donnelly about 4 years ago

  • Related to Bug #44281: pybind/mgr/volumes: cleanup stale connection hang added

#5 Updated by Patrick Donnelly about 4 years ago

  • Copied to Backport #44282: nautilus: mgr/volumes: deadlock when trying to purge large number of trash entries added

#6 Updated by Patrick Donnelly about 4 years ago

  • Related to deleted (Bug #44281: pybind/mgr/volumes: cleanup stale connection hang)

#7 Updated by Patrick Donnelly about 4 years ago

  • Related to Bug #44276: pybind/mgr/volumes: cleanup stale connection hang added

#8 Updated by Nathan Cutler about 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF