Project

General

Profile

Actions

Documentation #9430

closed

dev documentation about incompat features

Added by Loïc Dachary over 9 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Low
Assignee:
-
Category:
documentation
Target version:
-
% Done:

0%

Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

turn the following dialog into a documentation for developers

<loicd> I'll find you a link
<loicd> it's preload
<loicd> https://github.com/ceph/ceph/blob/master/src/erasure-code/ErasureCodePlugin.cc#L168
<loicd> I guess it polutes the output
<jannau> it does. for bench.sh a simple 2> /dev/null helps though
<loicd>   int code = instance.factory(plugin, parameters, &erasure_code, cerr);
<loicd> that's where it gets output on cerr
<loicd> it used to be silent
<loicd> http://tracker.ceph.com/issues/9429
<loicd> my bad
<loicd> joao: would you have time to take a look at https://github.com/dachary/ceph/commit/ba3f260e74d29193898e9ba1116d5e50827b08ec ? 
<loicd> sorry it's https://github.com/dachary/ceph/commit/19dd150e0375dce7a5d9d23f8e330eb34b5e09b5
<loicd> I removed the quorum_features & CEPH_FEATURE_ERASURE_CODE_PLUGINS_V2) https://github.com/dachary/ceph/commit/ba3f260e74d29193898e9ba1116d5e50827b08ec#diff-d4e1ba36dd08617ea271ab2be3dbcb5bR347 
<loicd> because it does not seem to belong to the same category as the other quorum features
<loicd> I assume the idea is that such a feature only gets activated if all mons in the quorum have it
<loicd> the scenario i'm worried about is 
<loicd> a) one mon is upgraded and creates a pool with plugin=lrc
<loicd> b) this mon goes away and another becomes the leader
<loicd> c) the osds try to communicate with a mon without the feature
<loicd> hum
<loicd> that should be ok since the the osd does not require the mon to have this feature and the mon won't ask it from the mon because it does not know about it
<joao> loicd, the first patch you mentioned had a safe-guard against that
<loicd> the incompat thing ?
<joao> by setting the incompat feature upon quorum_features & CEPH_FEATURE_INCOMPAT_ERASURE_CODE_V2
<joao> yes
<joao> ah
<loicd> I'd be happy to keep this
<joao> no, not really
<joao> it guarantees that a majority will set that
<loicd> but I don't fully understand the implications
<joao> not that all monitors do
<loicd> ah
<loicd> in any case, it does not matter if the mon do not implement the feature
<loicd> it means they won't be able to create pools using the new plugins but that does not mean they will somehow break the existing pools
<loicd> at the moment an erasure coded pool can only be created by a monitor if it is able to load the plugin
<loicd> if the mons don't have the feature they don't have the plugins either and we're not in danger of creating a pool that would require a plugin that's not available on all osds
<loicd> joao: does that make sense ? 
<joao> sort of
<joao> if the pool creation, and the plugin, is dependent on something being set on the osdmap, and this new thing is encoded/decoded on a field that may not be known to the monitors (depending on the monitor version), then you will need to enforce the quorum_features requirement
<joao> otherwise you *may* end up with divergent versions on the osdmap
<joao> e.g.
<joao> leader A supports feature, peon B supports feature, peon C does not
<joao> yet we allow the feature to be set
<joao> A encodes an osdmap with said feature, proposes the update to B and C
<joao> B is able to understand said update and generates the new full osdmap; C is able to understand the update but does not decode the new info (as it's not ready to handle it)
<joao> C will generate a divergent full map from that of A and B
<joao> if however you're reusing an existing member field of the osdmap, and if C is able to decode it and generate the same osdmap version, then all will be well
<joao> this is assuming that the contents of said member field is decoded regardless of the monitor being able to understand semantic implications
<joao> so this is one thing
<loicd> the incompat features are meant to protect against what you just described, right ? 
<joao> the other thing is that I believe that the osdmonitor will make sure to have monitors that do not support some osdmap features not being able to connect, as it will update the messenger's policy features upon 'update_from_paxos()'
<joao> loicd, yes
<loicd> ok, it makes sense. 
<joao> in this case you will want to make sure you write the incompat to disk once all the monitors in the quorum (i.e., a majority of monitors) support said feature
<joao> and you will want to keep the 'check_cluster_features()' checks when creating the pools
<joao> the incompat features will only guarantee that once you have a monitor that supports said feature and sets it on-disk, then you won't be able to revert to an earlier version that does not support said features
<joao> but it does nothing to protect you against a quorum that does not fully support said feature
<loicd> for that you need to set this incompat feature in https://github.com/dachary/ceph/commit/ba3f260e74d29193898e9ba1116d5e50827b08ec#diff-d4e1ba36dd08617ea271ab2be3dbcb5bR1941 ? 
<loicd> for that => to protect against a quorum tha tdoes not implement the feature
<loicd> joao: ^
<joao> was checking something
<joao> just a sec
<loicd> sure :-)
<loicd> https://github.com/dachary/ceph/commit/76f34d51cc00fdb9b2283a83e0ef6d3c1805bfb1#diff-d71e7a28d9b138151c55d97f5000fd80R993
<loicd> since the feature is deduced from existing fields in the OSDMap (the erasure code profile), older mons will be able to encode/decode OSDMaps that contain the feature.
<joao> okay, so there's a fairly simply way to avoid monitors without the feature to join the quorum once at least one monitor has set the on-disk feature
<joao> just add the feature to Monitor::apply_compatset_features_to_quorum_requirements()
-*- loicd checking
<ft1> hi, I have a question about indexing metadata in ceph
<ft1> is it btree?
<joao> it will prevent monitors from joining the cluster during the probing phase (monitors that do not support the feature will get an MMonProbe::MISSING_FEATURE reply)
<loicd> joao: that makes sense. Do you think I should create such a feature given the above ? (what I wrote after sure :-)
<joao> loicd, what happens if you try to create an lrc pool on a cluster in which a majority does not support said pool?
<loicd> a majority of mon ? 
<joao> monitors and osd alike?
<loicd> that can only happen if all OSDs have the feature because:
<loicd> a) https://github.com/dachary/ceph/commit/76f34d51cc00fdb9b2283a83e0ef6d3c1805bfb1#diff-d71e7a28d9b138151c55d97f5000fd80R993 guarantees it for all mons that have been upgraded and
<loicd> b) old mons will fail to create it because they don't have the lrc plugin which the mon will try to load when creating the pool
<loicd> ah 
<loicd> here is a scenario that would create problems
<loicd> after the pool is created, the mon that was upgraded goes away
<loicd> and there only are mon of the older versions
<loicd> an old OSD is resurected
<loicd> and the old mons will not refuse to have it 
<loicd> because they don't have such a logic
<loicd> the OSD will boot and fail miserably when required to load the lrc plugin
<joao> you must thus ensure that the feature is only enabled when you have a quorum that supports the feature
<loicd> joao: that calls for the incompat feature then, right ? 
<loicd> ok
<joao> okay, so, there's a difference between features and the compatset
<joao> the latter is for daemon supported features, and establish what features the daemon can and cannot work without
<loicd> does https://github.com/dachary/ceph/commit/ba3f260e74d29193898e9ba1116d5e50827b08ec#diff-d4e1ba36dd08617ea271ab2be3dbcb5bR347 looks like the right way to do it ? 
<joao> yes
<loicd> joao: I don't clearly understand the distinction between compatset and features indeed
<loicd> it looks like compatset is for information that is encoded/decoded locally 
<joao> right
<joao> so here goes an attempt at explaining this
<joao> say you change some encoding scheme on the monitor; for instance, moving from a fs-based store to leveldb
<joao> you add an incompat feature
<joao> once the monitor upgrades the store from one format to the other, you set an incompat feature and write it to disk
<loicd> so a monitor that does not have the feature knows it cannot read it because it misses said feature ? 
<joao> only monitors supporting that compatset feature will be allowed to work in the future; older monitors, without that compatset, will just suicide after finding about a feature they don't understand
<loicd> pl
<loicd> ok
-*- loicd adjusts to his new keyboard
<joao> then you have the so called "features", which may be related with the compatset but not necessarily
<joao> for instance, you don't really care about the monitor store format change anywhere else besides the monitor
<joao> that's not even a quorum requirement
<loicd> ok
<joao> however, supporting erasure codes is of influence to a lot more components than simply a single monitor's internal behavior
<joao> so you have a feature called CEPH_FEATURE_OSD_ERASURE_CODES
<joao> which on the monitor is tightly coupled to a compatset feature called CEPH_MON_FEATURE_INCOMPAT_OSD_ERASURE_CODES
<joao> and the rationale in that particular case is this: the monitor supports erasure codes but does not initially enforce it
<joao> so until someone creates an erasure-coded pool, it's just business as always
<joao> say you have two versions: A, without EC support, and B with EC support
<joao> say you upgrade a monitor from A to B
<joao> unless you create an EC pool, you will be able to downgrade from B to A, no problem
<joao> however you want to ensure that once you create an EC pool you are no longer able to run A, as A will not understand the osdmaps
<joao> so, once you create an EC pool, you will set the on-disk compatset stating that "this monitor now requires EC feature to properly run, if you do not support this please suicide" 
<joao> that takes care of the downgrade from B to A
<loicd> ok
<joao> also, we will only allow an EC pool to be created once we have EC support on all OSDs
<ft1> i have a question about indexing metadata in ceph
<joao> and that is accomplished by checking if the OSDs support the EC feature, and if all of them do, just "flip the switch" by setting the feature in the OSDMap
<loicd> ft1: what metadata are you trying to index ? 
<joao> once this is done, the monitors will set this new feature as a quorum requirement; this will lead to the monitors noticing that feature change, setting the on-disk compatset, and updating the required features for a future quorum
<ft1> loicd: like path 
<joao> a check in OSDMonitor::process_boot() will also ensure that OSDs without EC support will not be allowed to boot
<joao> loicd, does it make sense?
<loicd> joao: it is crystal clear :-)
<joao> cool
<loicd> and now I think I understand (I probably don't but I feel like I do ;-)
<loicd> joao: I probably need https://github.com/dachary/ceph/commit/ba3f260e74d29193898e9ba1116d5e50827b08ec#diff-d4e1ba36dd08617ea271ab2be3dbcb5bR1970
<loicd> to show something like
<ft1> loicd: do u know which index structure it use?
<loicd> required_features |= CEPH_FEATURE_ERASURE_CODE_PLUGINS_V2
<joao> loicd, I think that too
<joao> if you mean adding that to apply_compatset_features_to_quorum_requirements()
<loicd> ft1 I'm not sure I understand the question, could you explain which path you are refering to ? 
<loicd> joao: yes
-*- loicd patching
<joao> loicd, I think he means the way the mds stores the data/metadata in the osd
<joao> or maybe how it stores the metadata in memory
<joao> regardless, I have no idea
<joao> in memory it will probably be a btree
<joao> on disk is a mystery to me, but btree's are also a safe bet
<loicd> and I guess the "required_feature" is what prevents other MONs from joining the quorum after the feature has been set.
<joao> yep
<loicd> ft1: if you mean path in CephFS I can't help because I know nothing about it, sorry :-)
<ft1> loicd: for example i want to find a file which name is x.txt we sould not search all the namespace to find out where the x.txt is! 
<joao> once a monitor has a given required_features, it reply with a "missing feature" to all probes from monitors without a given required feature
<ft1> loicd: thanks for your help
<jcsp> ft1: there are papers about how CephFS metadata works: http://www.ssrc.ucsc.edu/Papers/weil-osdi06.pdf
<loicd> jcsp: good day sir :-0
<jcsp> loicd: ohai
<ft1> loicd: thanks :)
<loicd> joao: shouldn't OSDMonitor check the incompat feature before creating a pool ?
<loicd> I should look for other parts of the code that have similar requirements
<joao> the incompat feature?
<joao> loicd, don't understand what you mean
<joao> the osdmonitor checks the cluster features, which is what's important here
<loicd> nvm it's done at https://github.com/ceph/ceph/blob/master/src/mon/OSDMonitor.cc#L3214 which I now understand thanks to your explanations :-)
<loicd> what a ride :)
<joao> I believe it's safe to assume that the monitor will support that feature if there's code with logic for them and thus it's unnecessary to check the incompat set
<loicd> indeed

Actions #1

Updated by Zac Dover over 4 years ago

  • Status changed from New to Closed

This bug has been judged too old to fix. This is because either it is either 1) raised against a version of Ceph prior to Luminous, or 2) just really old, and untouched for so long that it is unlikely nowadays to represent a live documentation concern.

If you think that the closing of this bug is an error, raise another bug of a similar kind. If you think that the matter requires urgent attention, please let Zac Dover know at .

Actions

Also available in: Atom PDF