https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2020-01-02T19:00:16ZCeph CephFS - Bug #43407: mds crash after update to v14.2.5https://tracker.ceph.com/issues/43407?journal_id=1549162020-01-02T19:00:16ZPatrick Donnellypdonnell@redhat.com
<ul></ul><p>Were you using multiple MDS before?</p>
<p>Can you increase MDS debugging:</p>
<p>ceph config set mds debug_mds 10</p>
<p>and restart the MDS daemon?</p> CephFS - Bug #43407: mds crash after update to v14.2.5https://tracker.ceph.com/issues/43407?journal_id=1549202020-01-02T21:19:00ZMarco Savoca
<ul><li><strong>File</strong> <a href="/attachments/download/4652/ceph-mds.ceph2.log.gz">ceph-mds.ceph2.log.gz</a> added</li></ul><p>Yes I had 3 filesystems (namespaces), one for every mds daemon, and the setup was working up to the update to v14.2.5.</p>
<p>After the crash I deleted 2 of the 3 filesystems.</p>
<p>Attached the new full ceph-mds.log (some private Informations about folders and files are deleted).</p>
<p>By the way: <em>cephfs-journal-tool journal inspect</em> states a clean journal (beside this strange error message)<br /><pre><code class="text syntaxhl"><span class="CodeRay">2020-01-02 20:56:14.129 7ff4a4110700 -1 NetHandler create_socket couldn't create socket (97) Address family not supported by protocol
Overall journal integrity: OK
</span></code></pre><br />With this summary<br /><pre><code class="text syntaxhl"><span class="CodeRay">2020-01-02 20:56:46.956 7f69e9fb6700 -1 NetHandler create_socket couldn't create socket (97) Address family not supported by protocol
Events by type:
NOOP: 0
SESSION: 6
SUBTREEMAP: 3
UPDATE: 1030
Errors: 0
</span></code></pre></p> CephFS - Bug #43407: mds crash after update to v14.2.5https://tracker.ceph.com/issues/43407?journal_id=1549332020-01-03T14:23:58ZZheng Yanukernel@gmail.com
<ul></ul><p>mds shows there are some ENoOp log events. This means some region of mds log was erased by cephfs-journal-tools. Why did you do that</p>
<p>1. Try back mds journal:<br />cephfs-journal-tool journal export backup.bin</p>
<p>2. recover journal events:<br />cephfs-journal-tool journal export backup.bin</p>
<p>3. truncate journal<br />cephfs-journal-tool journal reset</p> CephFS - Bug #43407: mds crash after update to v14.2.5https://tracker.ceph.com/issues/43407?journal_id=1549342020-01-03T15:29:20ZMarco Savoca
<ul></ul><blockquote>
<p>2. recover journal events:<br />cephfs-journal-tool journal export backup.bin</p>
</blockquote>
<p>Do you mean<br /><em>cephfs-journal-tool event apply</em></p>
<p>or</p>
<p><em>cephfs-journal-tool event recover_dentries summary</em> ?</p>
<p>The only manipulative action on the journal, that i did was following:</p>
<p><em>sudo cephfs-journal-tool event splice --type OPEN summary</em></p>
<p>bacause i supposed the open events to be problematic. I did a backup of the journal before.</p>
<p>But all this actions were after the degrading of the fs and the crash of the mds.</p>
<p>So the question remains:<br />Why the the mds daemons crahed after the update?</p>
<p>By the way: I dont really care about data loss, due to a solid backup strategy.</p>
<p>Do you think, that deleting the filesystem and the associated pools will solve the problem an let the daemons restart normally?</p> CephFS - Bug #43407: mds crash after update to v14.2.5https://tracker.ceph.com/issues/43407?journal_id=1549412020-01-03T19:34:12ZMarco Savoca
<ul></ul><p>Status update:<br />I have tried <br />cephfs-journal-tool event recover_dentries summary<br />followed with<br />cephfs-journal-tool journal reset</p>
<p>This did the job, so that I was able to restart the mds daemons.</p>
<p>Someone an idea why the journal got corrupted?</p> CephFS - Bug #43407: mds crash after update to v14.2.5https://tracker.ceph.com/issues/43407?journal_id=1550142020-01-06T14:41:39ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Patrick Donnelly</i></li></ul> CephFS - Bug #43407: mds crash after update to v14.2.5https://tracker.ceph.com/issues/43407?journal_id=1550162020-01-06T14:52:40ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Triaged</i></li><li><strong>Assignee</strong> changed from <i>Patrick Donnelly</i> to <i>Zheng Yan</i></li></ul> CephFS - Bug #43407: mds crash after update to v14.2.5https://tracker.ceph.com/issues/43407?journal_id=1550202020-01-06T15:17:55ZZheng Yanukernel@gmail.com
<ul><li><strong>Assignee</strong> deleted (<del><i>Zheng Yan</i></del>)</li></ul><p>The first ESubtreeMap in the journal was wrong. It should also contains dir 0x1<br /><pre>
-4719> 2020-01-02 20:14:44.029 7f66a05f5700 10 mds.0.log _replay 808964827~520 / 812539516 2019-12-14 15:02:06.893542: ESubtreeMap 1 subtrees , 0 ambiguous [metablob 0x100, 1 dirs]
</pre></p>
<p>No idea now this can happens.</p>
<p>What steps you did when upgrading mds?</p>