https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2020-02-18T19:59:38ZCeph Ceph - Bug #44185: Monintors cascading crash as they become the leader (possibly a repeat of bug 41025)https://tracker.ceph.com/issues/44185?journal_id=1589932020-02-18T19:59:38ZRobert Burrowes
<ul><li><strong>File</strong> <a href="/attachments/download/4702/mon02-crash-dump.txt">mon02-crash-dump.txt</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/4702/mon02-crash-dump.txt">View</a> added</li></ul><p>cat of crash log attached, from mon02.</p>
<p>Versions, just before the last crash (having updated 4 monitors). Another oddity is that we accidentally restarted two managers by this point (apt update being helpful).</p>
<pre><code class="json syntaxhl"><span class="CodeRay"><span class="error">r</span><span class="error">o</span><span class="error">o</span><span class="error">t</span><span class="error">@</span><span class="error">n</span><span class="error">t</span><span class="error">r</span><span class="error">-</span><span class="error">m</span><span class="error">o</span><span class="error">n</span><span class="integer">0</span><span class="integer">1</span>:<span class="error">~</span><span class="error">#</span> <span class="error">c</span><span class="error">e</span><span class="error">p</span><span class="error">h</span> <span class="error">v</span><span class="error">e</span><span class="error">r</span><span class="error">s</span><span class="error">i</span><span class="error">o</span><span class="error">n</span><span class="error">s</span>
{
<span class="key"><span class="delimiter">"</span><span class="content">mon</span><span class="delimiter">"</span></span>: {
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable)</span><span class="delimiter">"</span></span>: <span class="integer">1</span>,
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)</span><span class="delimiter">"</span></span>: <span class="integer">4</span>
},
<span class="key"><span class="delimiter">"</span><span class="content">mgr</span><span class="delimiter">"</span></span>: {
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)</span><span class="delimiter">"</span></span>: <span class="integer">1</span>,
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable)</span><span class="delimiter">"</span></span>: <span class="integer">1</span>,
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)</span><span class="delimiter">"</span></span>: <span class="integer">2</span>
},
<span class="key"><span class="delimiter">"</span><span class="content">osd</span><span class="delimiter">"</span></span>: {
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)</span><span class="delimiter">"</span></span>: <span class="integer">175</span>,
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)</span><span class="delimiter">"</span></span>: <span class="integer">32</span>,
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable)</span><span class="delimiter">"</span></span>: <span class="integer">16</span>
},
<span class="key"><span class="delimiter">"</span><span class="content">mds</span><span class="delimiter">"</span></span>: {
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 13.2.8 (5579a94fafbc1f9cc913a0f5d362953a5d9c3ae0) mimic (stable)</span><span class="delimiter">"</span></span>: <span class="integer">1</span>
},
<span class="key"><span class="delimiter">"</span><span class="content">rgw</span><span class="delimiter">"</span></span>: {
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)</span><span class="delimiter">"</span></span>: <span class="integer">2</span>
},
<span class="key"><span class="delimiter">"</span><span class="content">overall</span><span class="delimiter">"</span></span>: {
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)</span><span class="delimiter">"</span></span>: <span class="integer">175</span>,
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)</span><span class="delimiter">"</span></span>: <span class="integer">35</span>,
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable)</span><span class="delimiter">"</span></span>: <span class="integer">18</span>,
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 13.2.8 (5579a94fafbc1f9cc913a0f5d362953a5d9c3ae0) mimic (stable)</span><span class="delimiter">"</span></span>: <span class="integer">1</span>,
<span class="key"><span class="delimiter">"</span><span class="content">ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)</span><span class="delimiter">"</span></span>: <span class="integer">6</span>
}
}
</span></code></pre> Ceph - Bug #44185: Monintors cascading crash as they become the leader (possibly a repeat of bug 41025)https://tracker.ceph.com/issues/44185?journal_id=1591972020-02-20T20:01:27ZRobert Burrowes
<ul></ul><p>Croit helped us rebuild luminous mons from the osds, and they think our crash was due the the test mds being at mimic, while the rest of the system was on luminous. They excised the test mds from our system. They think another upgrade will now work.</p>
<p>Still, crashing and leaving the cluster with no mons was not a graceful way of handling the error. It would have been nicer to have isolated the mds, and leave the rest of the service running.</p> Ceph - Bug #44185: Monintors cascading crash as they become the leader (possibly a repeat of bug 41025)https://tracker.ceph.com/issues/44185?journal_id=1592092020-02-21T03:52:34ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Closed</i></li></ul><p>Robert Burrowes wrote:</p>
<blockquote>
<p>Croit helped us rebuild luminous mons from the osds, and they think our crash was due the the test mds being at mimic, while the rest of the system was on luminous. They excised the test mds from our system. They think another upgrade will now work.</p>
<p>Still, crashing and leaving the cluster with no mons was not a graceful way of handling the error. It would have been nicer to have isolated the mds, and leave the rest of the service running.</p>
</blockquote>
<p>Ceph generally does not tolerate mixed versions of MDSs. This is slated to improve with rolling upgrade support.</p>
<p>With that said, the debug log does not show where the assertion occurred. It's difficult to say what happened without more information.</p>
<p>I'm closing this as there's nothing actionable for this ticket that isn't already planned in other work.</p>