Bug #44207
mgr/volumes: deadlock when trying to purge large number of trash entries
% Done:
0%
Source:
Community (dev)
Tags:
Backport:
nautilus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/volumes
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
There's a subtle deadlock when purge tasks (via the generic async job machinery) tries to fetch the next job to execute. The volume (filesystem) should be opened in lockless mode since the main thread (command dispatcher thread) serializes access to the volume.
Hit this once when trying to remove large number of trash entries.
Related issues
History
#1 Updated by Venky Shankar about 4 years ago
- Status changed from New to In Progress
#2 Updated by Venky Shankar about 4 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 33413
#3 Updated by Patrick Donnelly about 4 years ago
- Status changed from Fix Under Review to Pending Backport
- Target version set to v15.0.0
#4 Updated by Patrick Donnelly about 4 years ago
- Related to Bug #44281: pybind/mgr/volumes: cleanup stale connection hang added
#5 Updated by Patrick Donnelly about 4 years ago
- Copied to Backport #44282: nautilus: mgr/volumes: deadlock when trying to purge large number of trash entries added
#6 Updated by Patrick Donnelly about 4 years ago
- Related to deleted (Bug #44281: pybind/mgr/volumes: cleanup stale connection hang)
#7 Updated by Patrick Donnelly about 4 years ago
- Related to Bug #44276: pybind/mgr/volumes: cleanup stale connection hang added
#8 Updated by Nathan Cutler about 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".