mon: prevent old monitors which do not support new encodings from joining the cluster
When we added our new osdmap encoding, we changed the on-disk encoding of the map. If old monitors join up, they won't be able to understand the data they receive in transactions!
Prevent them from joining the quorum, and be nice about it.
Elector: ignore messages from mons without required feature capabilities
We maintain a list of required_features which the other monitor's features
must supply. This starts out at 0 and is initialized from the monitor's
list of features whenever we start electing.
Despite the scary sound of "just ignore it", this is safe: the monitor
will only record features as required once a quorum has formed in which
every monitor supports them. After that happens, monitors which do not
support those features will be unable to read the whole mon store/understand
the pg reports/whatever else, so letting them into the quorum would be buggy
So if we ignore a monitor, it will not be able to start nor join
an election round with anybody who was in our quorum -- that is, the
ignored monitor cannot form a separate quorum. By ignoring it here, we
also prevent it from endlessly calling elections against the real
Unfortunately there is no way to communicate to old monitors that they
cannot join the quorum -- there are no existing messages for that purpose,
and eg adding a new op to the MMonElection message will just cause it
to crash, which we don't want to do either.
Signed-off-by: Greg Farnum <firstname.lastname@example.org>
#2 Updated by Tamilarasi muthamizhan over 5 years ago
Greg, we have some tests in the upgrade:parallel suite [ceph-qa-suite/suites/upgrade/parallel/stress-split] to do this,
the config file would look like:
- chef: null - clock.check: - install: branch: dumpling - ceph: fs: xfs - install.upgrade: osd.0: next - ceph.restart: daemons: - osd.0 - osd.1 - osd.2 - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 timeout: 1200 - ceph.restart: daemons: - mon.a wait-for-healthy: false wait-for-osds-up: true - rados: clients: - client.0 objects: 50 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000 - ceph.restart: daemons: - mon.b wait-for-healthy: false wait-for-osds-up: true - workunit: branch: dumpling clients: client.0: - rados/test.sh - install.upgrade: mon.c: null - ceph.restart: daemons: - mon.c wait-for-healthy: false wait-for-osds-up: true - ceph.wait_for_mon_quorum: - a - b - c - workunit: branch: dumpling clients: client.0: - rados/test.sh