https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2022-09-07T15:14:12ZCeph Orchestrator - Bug #57462: cephadm removes config & keyring files in mid flight https://tracker.ceph.com/issues/57462?journal_id=2251702022-09-07T15:14:12ZTim Wilkinsontwilkins@redhat.com
<ul></ul><p>Bug 2024301 was the issue referenced above ... <a class="external" href="https://bugzilla.redhat.com/show_bug.cgi?id=2024301">https://bugzilla.redhat.com/show_bug.cgi?id=2024301</a></p> Orchestrator - Bug #57462: cephadm removes config & keyring files in mid flight https://tracker.ceph.com/issues/57462?journal_id=2251782022-09-07T17:30:51ZAdam King
<ul><li><strong>Assignee</strong> set to <i>Adam King</i></li></ul> Orchestrator - Bug #57462: cephadm removes config & keyring files in mid flight https://tracker.ceph.com/issues/57462?journal_id=2251802022-09-07T17:31:22ZAdam King
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-5 priority-high3 closed" href="/issues/56696">Bug #56696</a>: admin keyring disappears during qa run</i> added</li></ul> Orchestrator - Bug #57462: cephadm removes config & keyring files in mid flight https://tracker.ceph.com/issues/57462?journal_id=2290122022-12-07T15:35:09ZAdam King
<ul><li><strong>Backport</strong> set to <i>quincy, pacific</i></li><li><strong>Pull request ID</strong> set to <i>48074</i></li></ul> Orchestrator - Bug #57462: cephadm removes config & keyring files in mid flight https://tracker.ceph.com/issues/57462?journal_id=2290152022-12-07T15:38:39ZAdam King
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Pending Backport</i></li></ul> Orchestrator - Bug #57462: cephadm removes config & keyring files in mid flight https://tracker.ceph.com/issues/57462?journal_id=2290232022-12-07T15:41:45ZBackport Bot
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/58207">Backport #58207</a>: quincy: cephadm removes config & keyring files in mid flight </i> added</li></ul> Orchestrator - Bug #57462: cephadm removes config & keyring files in mid flight https://tracker.ceph.com/issues/57462?journal_id=2290252022-12-07T15:41:52ZBackport Bot
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/58208">Backport #58208</a>: pacific: cephadm removes config & keyring files in mid flight </i> added</li></ul> Orchestrator - Bug #57462: cephadm removes config & keyring files in mid flight https://tracker.ceph.com/issues/57462?journal_id=2290272022-12-07T15:41:54ZBackport Bot
<ul><li><strong>Tags</strong> set to <i>backport_processed</i></li></ul> Orchestrator - Bug #57462: cephadm removes config & keyring files in mid flight https://tracker.ceph.com/issues/57462?journal_id=2316762023-02-17T15:44:54ZAdam King
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul> Orchestrator - Bug #57462: cephadm removes config & keyring files in mid flight https://tracker.ceph.com/issues/57462?journal_id=2421202023-07-12T10:19:47ZVoja Molani
<ul></ul><p>It is definitely better in 17.2.6 I think, compared to 17.2.x releases earlier than 17.2.6. Before this PR I had at least 6+ occurrences of keyring (and sometimes <code>ceph.conf</code>) going missing over 2 clusters.</p>
<p>But just now I had a 17.2.6 cluster MGR node lose it's keyring. Somewhere between <code>ceph orch host maintenance enter</code>, reboot, and <code>ceph orch host maintenance exit</code>.</p>
<p>As used to happen so often on 17.2.5, when trying to run <code>ceph orch host maintenance exit</code> the command just takes forever and seems to hang; since it is taking so long the first thing I did was check <code>/etc/ceph/</code> and sure enough the keyring file was missing. Eventually the <code>ceph orch host maintenance exit</code> command errored due to missing keyring, put the keyring file back, re-run <code>ceph orch host maintenance exit</code> and now all services on the node are back again.</p>
<p>But the linked PR 48074 definitely does not completely fix this issue - or there are some further issues present with similar symptoms.</p> Orchestrator - Bug #57462: cephadm removes config & keyring files in mid flight https://tracker.ceph.com/issues/57462?journal_id=2421292023-07-12T12:42:49ZVoja Molani
<ul></ul><p>Second <strong>and third</strong> MGR node in the cluster being rebooted with the exact same process, after reboot <code>ceph orch host maintenance exit</code> didn't work because of missing keyring. Exactly like the first MGR node being rebooted. Seems to be a very repeatable problem so far, 3 out of 3 attempts failed at keyring lost.</p>