Project

General

Profile

Actions

Bug #36266

closed

mgr: deadlock in ClusterState

Added by Hector Martin over 5 years ago. Updated over 5 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
ceph-mgr
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This is a cluster with 3 mons/mgrs. A few minutes ago, the active mgr stopped responding. The cluster successfully failed over to a standby mgr. I'm using the prometheus and dashboard monitors. The exporter metrics get scraped periodically by Prometheus.

@
  1. ceph-mgr --version
    ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)
    @

I'm attaching a gdb backtrace. Looks like a deadlock:

Thread 46 is holding the Objecter rwlock and trying to lock ClusterState.
Thread 17 is holding the ClusterState lock and is trying to lock the Objecter rwlock.

I think there's a missing lock in ClusterState::with_osdmap:
https://github.com/ceph/ceph/blob/v12.2.8/src/mgr/ClusterState.h#L128


Files

ceph-mgr-backtrace.txt (86.6 KB) ceph-mgr-backtrace.txt Hector Martin, 09/30/2018 11:13 AM

Related issues 1 (0 open1 closed)

Is duplicate of mgr - Bug #23460: mgr deadlock: _check_auth_rotating possible clock skew, rotating keys expired way too earlyResolved

Actions
Actions

Also available in: Atom PDF