Actions
Bug #36266
closedmgr: deadlock in ClusterState
Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
ceph-mgr
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
This is a cluster with 3 mons/mgrs. A few minutes ago, the active mgr stopped responding. The cluster successfully failed over to a standby mgr. I'm using the prometheus and dashboard monitors. The exporter metrics get scraped periodically by Prometheus.
@- ceph-mgr --version
ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
@
I'm attaching a gdb backtrace. Looks like a deadlock:
Thread 46 is holding the Objecter rwlock and trying to lock ClusterState.
Thread 17 is holding the ClusterState lock and is trying to lock the Objecter rwlock.
I think there's a missing lock in ClusterState::with_osdmap:
https://github.com/ceph/ceph/blob/v12.2.8/src/mgr/ClusterState.h#L128
Files
Actions