Project

General

Profile

Actions

Bug #943

closed

3-mon cluster won't start

Added by Alexandre Oliva about 13 years ago. Updated almost 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Monitor
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I had run into this before, when I had only two monitors up, and thought the monitors had gone out of sync due to btrfs failures, and that this couldn't occur with 3 active mons, but it did.

The problem is that the cluster wouldn't start, from ceph -w's perspective. Monitors would exchange messages and choose a leader, but then, before any activity occurred, a new election would be proposed, over and over.

With 3 monitors, the scenario is even more interesting. Once I got into this situation, two monitors would choose a leader, then the third would come in and propose an election, but the second wouldn't participate, so the winner would announce the result without the second in the quorum, and the second would react by proposing another election, in which now the third node wouldn't participate, so it would call for another election, and so on, and so forth.

My guess is that this may be related with very large entries in logm taking longer to propagate than nodes are willing to wait before calling for another election and starting the process over. The next-after-committed logm entry was almost 60+MB, after a log of ping-ponging between the 3 monitors. This was after I accidentally got it to succeed, bringing all nodes down (mon, mds, and osd), then bringing up only two of the mons. When I brought it all up, it came to a halt and never started again. Or, rather, I had to manually commit a single-mon monmap to get the cluster going again. Just rsyncing the mon directory over to the other mons and restarting them was not enough, but keeping their broken configuration and adding them back one by one enabled them to sync up. Phew!


Files

mon1-logm.tar.xz (457 KB) mon1-logm.tar.xz logm.dropped with huge file, logm with the rest Alexandre Oliva, 05/17/2011 03:44 PM
Actions

Also available in: Atom PDF