Project

General

Profile

Actions

Bug #5471

closed

mon: do not join a quorum if quorum's version is lower than ours

Added by Joao Eduardo Luis almost 11 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Low
Category:
Monitor
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

With p being the monitor's Paxos version, consider:

  • A - p:100 (at time quorum was formed)
  • B - p:100 (at time quorum was formed)
  • C - p:200 (!quorum)
  • C starts; probes

If C's paxos [fc,lc] overlaps with A/B's paxos [fc,lc], then there will be no sync and C joins the quorum.

During recovery, say that we have A (p:130) and C (p:200). C will then share his state from [131,200] with A. A never shares its state from [100,130] with C -> monitors are inconsistent and updates from [100,200] have been lost!

Reproducible by:

  • quorum: A, B, C
  • ceph tell mon.b sync force (mimics a mkfs to some extent)
  • stop A
  • for i in `seq 1..100`; do ceph log $i ; done
  • stop all mons
  • restart B
  • restart A
  • B syncs from A
  • quorum: A, B
  • restart C

It is B's failure that ends up being responsible for contaminating the cluster state. By losing B's state, and due to it being brought up after user intervention with a clean slate, and by allowing it to form a quorum with an out-of-date monitor (A), the user is allowing its cluster to pick-up from a considerably out-of-date state. This should easily be avoided by bringing C up first and letting B sync from C instead.

It is thus fair to assume that the monitors themselves don't have the responsibility on the issues resulting from all the versions lost. This case is pretty specific and it involves a monitor with a clean slate forming a quorum with an out-of-date monitor, and that shouldn't be something that just happens, leading us to conclude that the user should be aware of what he's doing.

Therefore, all we can/should do is to guarantee that C doesn't join the quorum if it notices that the current cluster has a formed quorum and its version is lower than the one it currently holds. This still doesn't avoid the issues that may rise from letting C join this same quorum at a later point in time, when the quorum's version is higher than whatever version C holds -- we would need to associate additional metadata to the paxos versions to assess at which point in time did a given version was proposed (the election epoch, for instance; this arises an issue with a cluster having a lower election epoch, eventually rising it to the same as C's, but that is more improbable to happen).

Actions #1

Updated by Joao Eduardo Luis almost 11 years ago

I have a simple patch for this that simply compares the quorum's version to our own paxos version and forces us to suicide if it's lower.

Actions #2

Updated by Sage Weil over 10 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF