https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2017-01-14T00:54:33ZCeph Ceph - Bug #18516: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboothttps://tracker.ceph.com/issues/18516?journal_id=842042017-01-14T00:54:33ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>We discussed this on the mailing list.</p>
<blockquote>
<p>Sam wrote:<br />Oh, this is basically working as intended. What happened is that the mon died before the pending map was actually committed. The OSD has a timeout (5s) after which it stops trying to mark itself down and just dies (so that OSDs don't hang when killed). It took a bit longer than 5s for the remaining 2 mons to form a new quorum, so they never got the MOSDMarkMeDown message so we had to do it the slow way. I would prefer this behavior to changing the mon shutdown process or making the OSDs wait longer, so I think that's it. If you want to avoid disruption with colocated mons and osds, stop the osds first and then reboot.</p>
</blockquote>
<blockquote>
<p>I wrote:<br />We can probably make our systemd scripts do this automatically? Or at least, there's a Ceph super-task thingy and I bet we can order the shutdown so it waits to kill the monitor until all the OSDs processes have ended.</p>
</blockquote> Ceph - Bug #18516: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboothttps://tracker.ceph.com/issues/18516?journal_id=842062017-01-14T00:56:27ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Boris Ranto</i></li></ul><p>Boris, Dan suggests you know the most about our systemd scripts. Is this a feasible change to make?</p> Ceph - Bug #18516: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboothttps://tracker.ceph.com/issues/18516?journal_id=846842017-01-24T18:43:11ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>Boris said, in the tracker-period-which-was-lost:</p>
<blockquote>
<p>Hey Greg, this is feasible to do, in a way.<br />We can add either After= (preferred afaik) or Before= line in the unit scripts to define this behaviour. The line will define the behaviour on both the boot as well as shutdown (in reverse order, i.e. if there is 'After=B' line in 'A' then 'A' is started after 'B' and 'B' is stopped After 'A'). By default, the daemons are started/stopped simultaneously so we can see the described behaviour (mons are usually faster to go down).</p>
<p>However, we can only define this behaviour for a single machine that is running both mon and osd which we do not recommend to do anyway. We cannot (well, not with systemd) control this behaviour when we are doing a full cluster reboot.</p>
</blockquote> Ceph - Bug #18516: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboothttps://tracker.ceph.com/issues/18516?journal_id=846852017-01-24T18:44:27ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>That sounds to me like exactly what we're after. This doesn't need to be coordinated across nodes; it's specifically about small clusters where all the nodes are doing double-duty. We don't need to solve cluster-wide shutdown happening in the wrong order (that would be the fault of the orchestration system or the admin), and it's less likely to cause trouble anyway.</p> Ceph - Bug #18516: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboothttps://tracker.ceph.com/issues/18516?journal_id=847172017-01-25T11:50:22ZBoris Rantobranto@redhat.com
<ul></ul><p>Upstream PR: <a class="external" href="https://github.com/ceph/ceph/pull/13097">https://github.com/ceph/ceph/pull/13097</a></p> Ceph - Bug #18516: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboothttps://tracker.ceph.com/issues/18516?journal_id=859212017-02-10T18:36:51ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Resolved</i></li></ul> Ceph - Bug #18516: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboothttps://tracker.ceph.com/issues/18516?journal_id=859222017-02-10T19:59:36ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Status</strong> changed from <i>Resolved</i> to <i>Pending Backport</i></li><li><strong>Backport</strong> set to <i>Kraken, jewel, hammer</i></li></ul><p>Adding backport fields. This should go wherever we have systems in supported releases; I think I got the right ones.</p> Ceph - Bug #18516: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboothttps://tracker.ceph.com/issues/18516?journal_id=859332017-02-11T15:52:16ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Backport</strong> changed from <i>Kraken, jewel, hammer</i> to <i>kraken, jewel, hammer</i></li></ul><p><code>Greg IIRC systemd support wasn't production-ready when hammer was released, but it is backportable because there is a systemd/ directory and a ceph-osd</code>.service file in it.</p> Ceph - Bug #18516: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboothttps://tracker.ceph.com/issues/18516?journal_id=859842017-02-13T07:23:34ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/18906">Backport #18906</a>: jewel: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboot</i> added</li></ul> Ceph - Bug #18516: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboothttps://tracker.ceph.com/issues/18516?journal_id=859862017-02-13T07:23:36ZLoïc Dacharyloic@dachary.org
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/18907">Backport #18907</a>: kraken: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboot</i> added</li></ul> Ceph - Bug #18516: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboothttps://tracker.ceph.com/issues/18516?journal_id=863972017-02-20T17:12:40ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Backport</strong> changed from <i>kraken, jewel, hammer</i> to <i>kraken, jewel</i></li></ul> Ceph - Bug #18516: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboothttps://tracker.ceph.com/issues/18516?journal_id=863982017-02-20T17:13:02ZNathan Cutlerncutler@suse.cz
<ul></ul><p>systemd fixes are not needed in hammer, since systemd support was still in its infancy when hammer was released.</p> Ceph - Bug #18516: "osd marked itself down" will not recognised if host runs mon + osd on shutdown/reboothttps://tracker.ceph.com/issues/18516?journal_id=901832017-04-22T08:55:38ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul>