Bug #36266: mgr: deadlock in ClusterState - mgr - Ceph

Actions

Copy link

Bug #36266

closed

mgr: deadlock in ClusterState

Added by Hector Martin over 5 years ago. Updated over 5 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

ceph-mgr

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v12.2.8

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This is a cluster with 3 mons/mgrs. A few minutes ago, the active mgr stopped responding. The cluster successfully failed over to a standby mgr. I'm using the prometheus and dashboard monitors. The exporter metrics get scraped periodically by Prometheus.

ceph-mgr --version
ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
@

I'm attaching a gdb backtrace. Looks like a deadlock:

Thread 46 is holding the Objecter rwlock and trying to lock ClusterState.
Thread 17 is holding the ClusterState lock and is trying to lock the Objecter rwlock.

I think there's a missing lock in ClusterState::with_osdmap:
https://github.com/ceph/ceph/blob/v12.2.8/src/mgr/ClusterState.h#L128

Files

ceph-mgr-backtrace.txt (86.6 KB) ceph-mgr-backtrace.txt

Hector Martin, 09/30/2018 11:13 AM

Related issues 1 (0 open — 1 closed)