https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2011-12-14T19:03:31ZCeph Ceph - Bug #1831: mon: should not accept (and should disconnect) session when not in quorumhttps://tracker.ceph.com/issues/1831?journal_id=76392011-12-14T19:03:31ZSage Weilsage@newdream.net
<ul><li><strong>Subject</strong> changed from <i>OSDs don't connect to a new monitor when theirs is booted out of the cluster</i> to <i>mon: should not accept (and should disconnect) session when not in quorum</i></li></ul><p>I think there are two parts here:</p>
<pre><code>- the mon shouldn't let sessions start if it is not in the quorum. that may actually work already because sessions can't authenticate.<br /> - if the mon drops out of the quorum, it should disconnect sessions after some short-ish time period. (it can't be immediate or every election would trigger expensive reconnects.)</code></pre>
<p>For the disconnect part, we can either drop the connection and let clients hunt for a new monitor, or we can polite tell them to go elsewhere. Since we're out of the quorum, tho, we may not know where to send them, so a disconnect is probably ok for now...</p>
<p>Any other thoughts?</p> Ceph - Bug #1831: mon: should not accept (and should disconnect) session when not in quorumhttps://tracker.ceph.com/issues/1831?journal_id=76412011-12-14T22:10:32ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>There's two things here, the second being the monitor changes you're focusing on. I need to investigate further why the OSDs stayed connected though — I don't know if their session was working so all was well, or if it disconnected silently due to the networking issues and the OSDs just didn't notice (failing to reconnect elsewhere). Unfortunately networking issues were blocking me from further analysis today — hopefully tomorrow!</p> Ceph - Bug #1831: mon: should not accept (and should disconnect) session when not in quorumhttps://tracker.ceph.com/issues/1831?journal_id=78352012-01-04T10:47:38ZJosh Durgin
<ul></ul><p>Last night a test hit this. The MDS got stuck connected to an out-of-quorum mon, and never stopped being laggy.</p> Ceph - Bug #1831: mon: should not accept (and should disconnect) session when not in quorumhttps://tracker.ceph.com/issues/1831?journal_id=78492012-01-05T10:13:44ZSage Weilsage@newdream.net
<ul><li><strong>translation missing: en.field_position</strong> set to <i>33</i></li></ul> Ceph - Bug #1831: mon: should not accept (and should disconnect) session when not in quorumhttps://tracker.ceph.com/issues/1831?journal_id=78612012-01-05T14:21:08ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul><p>A basic stab at this is staggeringly boring — pushed to wip-mon-timeouts. I want to discuss instrumenting Monitor to stay out of quorums (for testing) and implementing internal heartbeats before merging.</p> Ceph - Bug #1831: mon: should not accept (and should disconnect) session when not in quorumhttps://tracker.ceph.com/issues/1831?journal_id=78642012-01-05T16:46:38ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>4</i></li></ul><p>Updated the branch, it now includes monitor commands and instrumentation so you can drop a monitor out of the quorum. Unfortunately it's hard to get the monitor back in right now, since the monitor quickly goes unreadable in paxos and then it can't authenticate new clients.</p> Ceph - Bug #1831: mon: should not accept (and should disconnect) session when not in quorumhttps://tracker.ceph.com/issues/1831?journal_id=78882012-01-06T10:28:34ZSage Weilsage@newdream.net
<ul><li><strong>Status</strong> changed from <i>4</i> to <i>Resolved</i></li></ul><p><a class="changeset" title="Merge remote branch 'gh/wip-mon-timeouts'" href="https://tracker.ceph.com/projects/ceph/repository/revisions/13f1debbf054612fbb2c9f4dafbe12c8f937cf14">13f1debbf054612fbb2c9f4dafbe12c8f937cf14</a></p>