https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2016-07-06T20:46:09ZCeph CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"https://tracker.ceph.com/issues/16592?journal_id=739882016-07-06T20:46:09ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Patrick Donnelly</i></li></ul> CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"https://tracker.ceph.com/issues/16592?journal_id=740562016-07-07T20:58:00ZPatrick Donnellypdonnell@redhat.com
<ul></ul><p>Should note that this is maybe related to: <a class="external" href="http://tracker.ceph.com/issues/15591">http://tracker.ceph.com/issues/15591</a></p> CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"https://tracker.ceph.com/issues/16592?journal_id=748612016-07-15T16:24:55ZPatrick Donnellypdonnell@redhat.com
<ul></ul><p>So, rambling brain dump of my current thoughts on this:</p>
<p>I haven't been able to reproduce this problem. There are two known instances of<br />this bug in 10.2.2 upgrades after [1]: [2] and [3].</p>
<p>Dzianis upgraded from Infernalis to 10.2.2. Bill upgraded from 10.2.0 to<br />10.2.2. So far I've assumed that they are afflicted by the same bug but<br />obviously that can't be known until we can reproduce this.</p>
<p>So far it is clear that the problem in both cases is that standby_daemons [4]<br />has a MDS in it that is not in standby. The code is sprinkled with assertions<br />that check this invariant (ideally, we would enforce that on mutations of<br />standby_daemons). Both users are hitting this assertion (in different places).</p>
<p>I have tried naive exercises doing live/dead upgrades from v9.2.1 to v10.2.2<br />with mon/mds in various upgrade states. For example:</p>
<p>o Kill all mds and upgrade. Then upgrade mons.<br />o Kill lead mds and lead mon, upgrade. Wait. Upgrade the rest.<br />o Use "mds standby replay" / "mds standby for rank" in various configurations during upgrade.</p>
<p>I wasn't able to hit the assertion. (In most cases, I was trying to get<br />standby_daemons of a v10.2.2 mon to violate its "info->state ==<br />MDS_STATE_STANDBY" invariant through updates from other monitors.)</p>
<p>In v10.2.0+, a standby daemon's state may be changed in these places:</p>
<p>o [5] (removed in v10.2.2)<br />o [6] (removed in v10.2.2)<br />o [7] <-- strong candidate for bug?<br />o [8] and [9] (fs command, probably unrelated)<br />o [10] <-- strong candidate for bug?<br />o [11] which is only called from [12], so must be standby.<br />o [11] <-- received bad standby_daemon from other v10.2.0+ mon. Possible to hit on v10.2.2 if [5] or [6] are exercised on a v10.2.0 daemon!</p>
<p>I have tried to reproduce using [7] but I have not found a way to have a daemon<br />with rank MDS_RANK_NONE that is not also MDS_STATE_STANDBY.</p>
<p>That leaves [10] which I have done some code reading to see how to get a beacon<br />sent which causes that code path to run which modifies a standby daemon. So far<br />my thoughts on that are a v10.2.0 mds daemon asking to be<br />STANDBY_REPLAY_ONESHOT [13] which is not handled correctly (??) by a v10.2.2<br />mon as STANDBY_REPLAY_ONESHOT is removed since [15]. I do not really think this<br />is likely as STANDBY_REPLAY_ONESHOT is not really used according to [15] and<br />wouldn't really affect a live upgrade.</p>
<p>The other possibility is a v10.2.0 MDS wanting to be state<br />MDS_STATE_STANDBY_REPLAY in [13]. This possibility is handled in v10.2.2 in<br />[14]. However, <b>I suspect it may be possible during live upgrade from v10.2.0<br />to v10.2.2 for an older monitor to change standby_daemons in the [5] and [6]<br />code paths</b>. In this way, a bad standby_daemons map could be sent from an<br />older v10.2.0 monitor to a v10.2.2 monitor. [For this possibility, upgrading is<br />not actually necessary as v10.2.0 should fail eventually too with an<br />assertion?]</p>
<p>Anyway, I think at this point I'm spinning my wheels here (although, going<br />through the MDSMonitor/FSMap/MDSMap code was very useful). Maybe someone else<br />has an idea here.</p>
<p>BTW, I will soon submit a PR which adds more assertions in the code paths which<br />may violate the standby_daemons invariant so we can catch this problem earlier.</p>
<p>[1] <a class="external" href="http://tracker.ceph.com/issues/15591">http://tracker.ceph.com/issues/15591</a>:<br />[2] <a class="external" href="http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-June/011033.html">http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-June/011033.html</a><br />[3] <a class="external" href="http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/011098.html">http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/011098.html</a><br />[4] standby_daemons is a map of standbys maintained in the FSMap. The FSMap is<br />a new structure that contains one or more MDSMaps (one for each file system, we<br />can now have more than one). A standby is no longer associated with a specific<br />file system and may be used on demand.<br />[5] <a class="external" href="https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L617-L623">https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L617-L623</a><br />[6] <a class="external" href="https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L639-L645">https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L639-L645</a><br />[7] <a class="external" href="https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L702-L705">https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L702-L705</a><br />[8] <a class="external" href="https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L2232-L2234">https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L2232-L2234</a><br />[9] <a class="external" href="https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L2250-L2252">https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L2250-L2252</a><br />[10] <a class="external" href="https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mds/FSMap.cc#L410-L414">https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mds/FSMap.cc#L410-L414</a><br />[11] <a class="external" href="https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mds/FSMap.cc#L769-L774">https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mds/FSMap.cc#L769-L774</a><br />[12] <a class="external" href="https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L523-L533">https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mon/MDSMonitor.cc#L523-L533</a><br />[13] <a class="external" href="https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mds/MDSDaemon.cc#L980-L986">https://github.com/ceph/ceph/blob/3a9fba20ec743699b69bd0181dd6c54dc01c64b9/src/mds/MDSDaemon.cc#L980-L986</a><br />[14] <a class="external" href="https://github.com/ceph/ceph/blob/45107e21c568dd033c2f0a3107dec8f0b0e58374/src/messages/MMDSBeacon.h#L240-L245">https://github.com/ceph/ceph/blob/45107e21c568dd033c2f0a3107dec8f0b0e58374/src/messages/MMDSBeacon.h#L240-L245</a> (v10.2.2!)<br />[15] <a class="external" href="https://github.com/ceph/ceph/commit/02e3edd93c0f4ef6e0d11df1f35187f74c7ea2ff">https://github.com/ceph/ceph/commit/02e3edd93c0f4ef6e0d11df1f35187f74c7ea2ff</a></p> CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"https://tracker.ceph.com/issues/16592?journal_id=748682016-07-16T17:08:32ZPatrick Donnellypdonnell@redhat.com
<ul></ul><p>PR for added assertions: <a class="external" href="https://github.com/ceph/ceph/pull/10316">https://github.com/ceph/ceph/pull/10316</a></p> CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"https://tracker.ceph.com/issues/16592?journal_id=748692016-07-16T18:01:12ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Backport</strong> set to <i>jewel</i></li></ul> CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"https://tracker.ceph.com/issues/16592?journal_id=749122016-07-18T16:42:49ZDenis kaganovichmahatma@eu.by
<ul></ul><p>I make (maybe wrong, but no way back) one-shot upgrade: stop all client, stop all ceph daemons (mds,osd,mon) and run all again upgraded. Now (after bypath assertion patching, but while not pull 10316) I have all working, but with strange effect (may be even not related to this bug):</p>
<ol>
<li>ceph mds stat<br />e5682: 1/1/1 up {0=c=up:standby-replay}, 1 up:standby</li>
</ol>
<p>or same but up:active. But all 3 mds started: active, standby-reply & standby. "mds stat" show only one of non-standby mds's - active or standby-replay (max_mds=1). IMHO there are mds state misconfiguration too.</p> CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"https://tracker.ceph.com/issues/16592?journal_id=749132016-07-18T16:43:45ZPatrick Donnellypdonnell@redhat.com
<ul></ul><p>Dzianis reported that he upgraded to 10.2.2 without ever upgrading to 10.2.0 (and downgrading after, if that's even possible):</p>
<p><a class="external" href="http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/011603.html">http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-July/011603.html</a></p>
<p>So this would indicate that <a class="external" href="http://tracker.ceph.com/issues/15591">http://tracker.ceph.com/issues/15591</a> is a different problem.</p> CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"https://tracker.ceph.com/issues/16592?journal_id=750122016-07-19T22:56:38ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Category</strong> set to <i>Correctness/Safety</i></li><li><strong>Component(FS)</strong> <i>MDSMonitor</i> added</li></ul> CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"https://tracker.ceph.com/issues/16592?journal_id=752972016-07-25T10:13:04ZJohn Sprayjcspray@gmail.com
<ul></ul><p>Denis: you are seeing <a class="external" href="http://tracker.ceph.com/issues/15705">http://tracker.ceph.com/issues/15705</a>, unrelated to this ticket.</p> CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"https://tracker.ceph.com/issues/16592?journal_id=753072016-07-25T12:44:27ZJohn Sprayjcspray@gmail.com
<ul></ul><p>It is interesting that we're hitting this in maybe_promote_standby and <strong>not</strong> in sanity(). Sanity gets called after paxos decode and before encode, so the fact that we're not hitting it there means that the bad state is probably originating while handling a beacon.</p>
<p>Case 7 does indeed look like a strong candidate.</p>
<p>Older daemons would set STANDBY_REPLAY here, but that's filtered out in MMDSBeacon::decode_payload and set back to STANDBY. The ONESHOT case is possible but really unlikely that the users did this and didn't mention it. However, we should whitelist the acceptable states during these updates (in addition to the assertion that Patrick already added to modify_daemon).</p>
<p>Another potential way for a bad state to make it through to this point would be if the epoch checking in preprocess_beacon was wrong: this got more complicated with multi-filesystem support (see effective_epoch in preprocess_beacon). I don't see any bug there though.</p>
<p>Opened <a class="external" href="https://github.com/ceph/ceph/pull/10428">https://github.com/ceph/ceph/pull/10428</a> to validate state transitions (and drop beacons rather than asserting out when something goes bad).</p> CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"https://tracker.ceph.com/issues/16592?journal_id=773652016-08-25T16:25:36ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Need More Info</i></li><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>Normal</i></li></ul><p>Moving this down and setting Need More Info based on Patrick's investigation and the new asserts; let me know if that was the wrong move Patrick.</p> CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"https://tracker.ceph.com/issues/16592?journal_id=810322016-11-10T03:50:51ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Duplicated by</strong> <i><a class="issue tracker-1 status-3 priority-5 priority-high3 closed" href="/issues/17837">Bug #17837</a>: ceph-mon crashed after upgrade from hammer 0.94.7 to jewel 10.2.3</i> added</li></ul> CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"https://tracker.ceph.com/issues/16592?journal_id=811172016-11-11T06:46:42Zalexander walkerwlkalexander@gmail.com
<ul><li><strong>File</strong> <a href="/attachments/download/2573/mdsmap.bin">mdsmap.bin</a> added</li></ul><p>I've created the ticket <a class="issue tracker-1 status-3 priority-5 priority-high3 closed" title="Bug: ceph-mon crashed after upgrade from hammer 0.94.7 to jewel 10.2.3 (Resolved)" href="https://tracker.ceph.com/issues/17837">#17837</a>.<br />Here ist output from "ceph mds dump --format=json-pretty" <br /><pre>
{
"epoch": 440,
"flags": 0,
"created": "2016-03-11 15:24:45.516358",
"modified": "2016-11-09 14:31:22.696300",
"tableserver": 0,
"root": 0,
"session_timeout": 60,
"session_autoclose": 300,
"max_file_size": 1099511627776,
"last_failure": 395,
"last_failure_osd_epoch": 2465,
"compat": {
"compat": {},
"ro_compat": {},
"incompat": {
"feature_1": "base v0.20",
"feature_2": "client writeable ranges",
"feature_3": "default file layouts on dirs",
"feature_4": "dir inode in separate object",
"feature_5": "mds uses versioned encoding",
"feature_6": "dirfrag is stored in omap",
"feature_8": "no anchor table"
}
},
"max_mds": 1,
"in": [
0
],
"up": {
"mds_0": 5854219
},
"failed": [],
"stopped": [],
"info": {
"gid_5854102": {
"gid": 5854102,
"name": "ceph2.aditosoftware.local",
"rank": -1,
"incarnation": 0,
"state": "up:standby",
"state_seq": 1,
"addr": "192.168.49.102:6800\/1261",
"standby_for_rank": -1,
"standby_for_name": "",
"export_targets": []
},
"gid_5854219": {
"gid": 5854219,
"name": "ceph1.aditosoftware.local",
"rank": 0,
"incarnation": 41,
"state": "up:active",
"state_seq": 111157,
"addr": "192.168.49.101:6800\/1287",
"standby_for_rank": -1,
"standby_for_name": "",
"export_targets": []
},
"gid_6000903": {
"gid": 6000903,
"name": "ceph3.aditosoftware.local",
"rank": -1,
"incarnation": 0,
"state": "up:standby",
"state_seq": 1,
"addr": "192.168.49.103:6800\/1231",
"standby_for_rank": -1,
"standby_for_name": "",
"export_targets": []
}
},
"data_pools": [
1
],
"metadata_pool": 2,
"enabled": true,
"fs_name": "cephfs_fs"
}
</pre></p>
<p>And I've attached the output from "ceph mds getmap > mdsmap.bin"</p> CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"https://tracker.ceph.com/issues/16592?journal_id=977182017-08-25T21:22:56ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Assignee</strong> deleted (<del><i>Patrick Donnelly</i></del>)</li></ul>