Project

General

Profile

Bug #3587

mon: election doesn't finish during heavy mon thrashing

Added by Joao Eduardo Luis over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
High
Category:
Monitor
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

While trying to trigger #3495 using

$ while [ 1 ]; do ./init-ceph restart mon.a ; sleep 30 ; done
$ while [ 1 ]; do ./init-ceph restart mds.a ; sleep 2 ; done
$ while [ 1 ]; do ./init-ceph restart osd.1 ; sleep 2 ; done

At a certain point in time, mon.a got stuck electing (which was noticed after canceling its restart loop). My suspicion is that it happened after #3495 was triggered on mon.b during, or right before, an election cycle.

I've attached both mon.a's and mon.b's logs; mon.b's log does have the stack trace from #3495, but might be useful to further inquire what has happened in case its failure had anything to do with the infinite election cycle.

mon.a.log View (6.25 MB) Joao Eduardo Luis, 12/07/2012 04:19 AM

mon.b.log View (4.26 MB) Joao Eduardo Luis, 12/07/2012 04:19 AM

mon.c.log View (4.42 MB) Joao Eduardo Luis, 12/07/2012 05:18 AM

Associated revisions

Revision 1acb6910 (diff)
Added by Joao Eduardo Luis over 7 years ago

mon: Elector: init elector before each election

Fixes: #3587

Signed-off-by: Joao Eduardo Luis <>

History

#1 Updated by Joao Eduardo Luis over 7 years ago

  • Subject changed from mon: election doesn't finish during heavy osd/mds thrashing to mon: election doesn't finish during heavy mon thrashing

#2 Updated by Joao Eduardo Luis over 7 years ago

This is being caused by the fact that, from the other monitors point-of-view, mon.a never left the quorum, thus they just ignore its election proposals as being 'old'.

Also, there's the fact that the elector class is writing its election epochs to the store, each time they are bumped, but never reads them. This means that the monitor will always start with a election epoch of 1, regardless the last election it has seen. For this particular case, reading the election epoch would help, as it is the same as the remaining monitors and the election proposal would then go through. This is a corner-case, and it should be guaranteed that, when it happens, the other monitors will always have the same election epoch as mon.a; otherwise, it would mean that a new quorum had been formed, without mon.a in it, and we wouldn't stumble upon this situation.

Also, attaching mon.c's log, as it was the one that proved to bear more insight into the matter.

#3 Updated by Joao Eduardo Luis over 7 years ago

  • Status changed from New to Fix Under Review

Haven't been able to reproduce the bug since commit e6c15e73543593fc55ba3846197fb7f83f949bb7 from wip-3587.

#4 Updated by Sage Weil over 7 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF