Bug #23403
closedMon cannot join quorum
0%
Description
Hi all,
On a 3-mon cluster running infernalis one of the mon left the quorum and we are unable to make it come back (it is in the monmap though). The quorum keeps looping into 'electing' mode. The cluster is running OK with 2 mons now, but it's obviously not the best situation.
We have checked and tried several things:
- there is no clock skew
- network connections are OK (tested connections between all the nodes, both ways)
- we completely removed the mon from the monmap, and re-created it (ceph-mon mkfs), on the same server
- we also tried to add an additional mon, it failed to join the quorum as well
We upgraded to jewel (the upgrade was scheduled) to see if it would help: it didn't.
We've enabled debug logs for mon and paxos (files attached, they are probably too large but we're not sure what is relevant and what isn't). controller02 is the mon that can't join the quorum: logs start after a complete reset (ceph-mon mkfs) and process startup, and end after removal of this mon from the monmap.
At this point we don't really know what could be the problem, or if this is a bug. Let us know if you need more information.
Thanks for any help/advice.
Files