Project

General

Profile

Actions

Bug #23403

closed

Mon cannot join quorum

Added by Gauvain Pocentek about 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi all,

On a 3-mon cluster running infernalis one of the mon left the quorum and we are unable to make it come back (it is in the monmap though). The quorum keeps looping into 'electing' mode. The cluster is running OK with 2 mons now, but it's obviously not the best situation.

We have checked and tried several things:

  • there is no clock skew
  • network connections are OK (tested connections between all the nodes, both ways)
  • we completely removed the mon from the monmap, and re-created it (ceph-mon mkfs), on the same server
  • we also tried to add an additional mon, it failed to join the quorum as well

We upgraded to jewel (the upgrade was scheduled) to see if it would help: it didn't.

We've enabled debug logs for mon and paxos (files attached, they are probably too large but we're not sure what is relevant and what isn't). controller02 is the mon that can't join the quorum: logs start after a complete reset (ceph-mon mkfs) and process startup, and end after removal of this mon from the monmap.

At this point we don't really know what could be the problem, or if this is a bug. Let us know if you need more information.

Thanks for any help/advice.


Files

ceph-mon.controller02.log.bz2 (13.6 KB) ceph-mon.controller02.log.bz2 Gauvain Pocentek, 03/19/2018 10:24 AM
ceph-mon.controller03.log.bz2 (36.5 KB) ceph-mon.controller03.log.bz2 Gauvain Pocentek, 03/19/2018 10:25 AM
ceph-mon.controller01.log.bz2 (113 KB) ceph-mon.controller01.log.bz2 Gauvain Pocentek, 03/19/2018 10:25 AM
controller03-quorum_status.log (924 Bytes) controller03-quorum_status.log Julien Lavesque, 03/29/2018 09:14 AM
controller03-mon_status.log (959 Bytes) controller03-mon_status.log Julien Lavesque, 03/29/2018 09:14 AM
controller01-mon_status.log (959 Bytes) controller01-mon_status.log Julien Lavesque, 03/29/2018 09:14 AM
controller01-quorum_status.log (834 Bytes) controller01-quorum_status.log Julien Lavesque, 03/29/2018 09:14 AM
controller02-quorum_status.log (834 Bytes) controller02-quorum_status.log Julien Lavesque, 03/29/2018 09:14 AM
controller02-mon_status.log (925 Bytes) controller02-mon_status.log Julien Lavesque, 03/29/2018 09:14 AM
Actions

Also available in: Atom PDF