https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2019-07-22T21:02:59ZCeph Linux kernel client - Bug #40862: kclient: crashed after evicted twicehttps://tracker.ceph.com/issues/40862?journal_id=1412262019-07-22T21:02:59ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Project</strong> changed from <i>Ceph</i> to <i>Linux kernel client</i></li><li><strong>Subject</strong> changed from <i>ceph client is crashed after evicted twice</i> to <i>kclient: crashed after evicted twice</i></li><li><strong>Assignee</strong> set to <i>Jeff Layton</i></li><li><strong>Start date</strong> deleted (<del><i>07/22/2019</i></del>)</li><li><strong>Source</strong> set to <i>Community (user)</i></li></ul> Linux kernel client - Bug #40862: kclient: crashed after evicted twicehttps://tracker.ceph.com/issues/40862?journal_id=1412332019-07-22T21:13:46ZJeff Laytonjlayton@redhat.com
<ul></ul><p>It oopsed because of this in ceph_set_page_dirty:</p>
<pre>
if (PageDirty(page)) {
dout("%p set_page_dirty %p idx %lu -- already dirty\n",
mapping->host, page, page->index);
BUG_ON(!PagePrivate(page)); <<<< CRASH HERE
return 0;
}
</pre>
<p>We clear out the PagePrivate bit (and associated private info) when the page is invalidated. I suspect that this page got invalidated, but the dirty bit was left intact. We then tried to redirty the page. We probably ought to clear the dirty bit (and clean up the accounting) when the page is invalidated.</p>
<p>It looks like this can happen in ceph_writepages_start in certain situations. We call invalidatepage directly, but don't clear the dirty bit. There may be other ways that this can happen as well.</p>
<p>Do we have a reliable reproducer for this?</p> Linux kernel client - Bug #40862: kclient: crashed after evicted twicehttps://tracker.ceph.com/issues/40862?journal_id=1412602019-07-23T05:56:58ZErqi Chen
<ul></ul><p>Jeff Layton wrote:</p>
<blockquote>
<p>It oopsed because of this in ceph_set_page_dirty:</p>
<p>[...]</p>
<p>We clear out the PagePrivate bit (and associated private info) when the page is invalidated. I suspect that this page got invalidated, but the dirty bit was left intact. We then tried to redirty the page. We probably ought to clear the dirty bit (and clean up the accounting) when the page is invalidated.</p>
<p>It looks like this can happen in ceph_writepages_start in certain situations. We call invalidatepage directly, but don't clear the dirty bit. There may be other ways that this can happen as well.</p>
<p>Do we have a reliable reproducer for this?</p>
</blockquote>
<p>Yes, invalidatepage in ceph_writepages_start may cause this issue, and I have proposed a patch to ceph-devel group.<br />It happened only once in our situation when client failed to response mds's cap revoke request.</p> Linux kernel client - Bug #40862: kclient: crashed after evicted twicehttps://tracker.ceph.com/issues/40862?journal_id=1417862019-07-25T18:42:36ZJeff Laytonjlayton@redhat.com
<ul></ul><p>Patch merged into ceph-client/testing branch:</p>
<p><a class="external" href="https://marc.info/?l=ceph-devel&m=156393518802502&w=2">https://marc.info/?l=ceph-devel&m=156393518802502&w=2</a></p> Linux kernel client - Bug #40862: kclient: crashed after evicted twicehttps://tracker.ceph.com/issues/40862?journal_id=1443222019-08-26T15:25:08ZIlya Dryomov
<ul><li><strong>Category</strong> set to <i>fs/ceph</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>Pending Backport</i></li></ul><p><a class="external" href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c95f1c5f436badb9bb87e9b30fd573f6b3d59423">https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c95f1c5f436badb9bb87e9b30fd573f6b3d59423</a> in 5.3-rc6</p> Linux kernel client - Bug #40862: kclient: crashed after evicted twicehttps://tracker.ceph.com/issues/40862?journal_id=1448282019-08-29T10:50:02ZIlya Dryomov
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul><p>In 4.19.69 and 5.2.11.</p>