Project

General

Profile

Fix #7215

mon: prevent old monitors which do not support new encodings from joining the cluster

Added by Greg Farnum about 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Monitor
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When we added our new osdmap encoding, we changed the on-disk encoding of the map. If old monitors join up, they won't be able to understand the data they receive in transactions!
Prevent them from joining the quorum, and be nice about it.

Associated revisions

Revision 687b570b (diff)
Added by Greg Farnum about 10 years ago

Elector: ignore messages from mons without required feature capabilities

We maintain a list of required_features which the other monitor's features
must supply. This starts out at 0 and is initialized from the monitor's
list of features whenever we start electing.
Despite the scary sound of "just ignore it", this is safe: the monitor
will only record features as required once a quorum has formed in which
every monitor supports them. After that happens, monitors which do not
support those features will be unable to read the whole mon store/understand
the pg reports/whatever else, so letting them into the quorum would be buggy
behavior.
So if we ignore a monitor, it will not be able to start nor join
an election round with anybody who was in our quorum -- that is, the
ignored monitor cannot form a separate quorum. By ignoring it here, we
also prevent it from endlessly calling elections against the real
quorum.

Unfortunately there is no way to communicate to old monitors that they
cannot join the quorum -- there are no existing messages for that purpose,
and eg adding a new op to the MMonElection message will just cause it
to crash, which we don't want to do either.

Fixes: #7215

Signed-off-by: Greg Farnum <>

History

#1 Updated by Greg Farnum about 10 years ago

wip-7215-quorum-features
Going to have to do some manual testing; not sure if there's any feasible automated testing we can set up or should perform on this. :/

#2 Updated by Tamilarasi muthamizhan about 10 years ago

Greg, we have some tests in the upgrade:parallel suite [ceph-qa-suite/suites/upgrade/parallel/stress-split] to do this,

the config file would look like:

- chef: null
- clock.check: 
- install:
    branch: dumpling
- ceph:
    fs: xfs
- install.upgrade:
    osd.0: next
- ceph.restart:
    daemons:
    - osd.0
    - osd.1
    - osd.2
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- ceph.restart:
    daemons:
    - mon.a
    wait-for-healthy: false
    wait-for-osds-up: true
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000
- ceph.restart:
    daemons:
    - mon.b
    wait-for-healthy: false
    wait-for-osds-up: true
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test.sh
- install.upgrade:
    mon.c: null
- ceph.restart:
    daemons:
    - mon.c
    wait-for-healthy: false
    wait-for-osds-up: true
- ceph.wait_for_mon_quorum:
  - a
  - b
  - c
- workunit:
    branch: dumpling
    clients:
      client.0:
      - rados/test.sh

#3 Updated by Greg Farnum about 10 years ago

Nifty, thanks Tamil! That should let me cover a bunch of it.

#4 Updated by Greg Farnum about 10 years ago

  • Status changed from In Progress to Resolved

This is merged (and so are some fixes around it), just didn't get the automated tests debugged but they can go elsewhere.

Also available in: Atom PDF