https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2018-05-15T19:38:13ZCeph CephFS - Bug #24137: client: segfault in trim_capshttps://tracker.ceph.com/issues/24137?journal_id=1132382018-05-15T19:38:13ZPatrick Donnellypdonnell@redhat.com
<ul></ul><p>Reasonable assumption about this crash is either the inode was deleted (in which case the Cap should have been deleted too) or the inode is a nullptr. The latter should never happen AFAICT as the inode is set only when the cap is created.</p>
<p>I couldn't find a way for the inode/cap to be deleted while the cap remains in the session cap list. Ideas?</p> CephFS - Bug #24137: client: segfault in trim_capshttps://tracker.ceph.com/issues/24137?journal_id=1132892018-05-16T11:10:04ZZheng Yanukernel@gmail.com
<ul></ul><pre>
commit 4dda1b6ead6b1f04f996403881f61e9b7d94dba0
Author: Patrick Donnelly <pdonnell@redhat.com>
Date: Sun Nov 19 15:08:18 2017 -0800
client: anchor Inode while trimming caps
This prevents the Inode from being deleted until after cap trimming is
finished. In particular, this prevents remove_all_caps from being called which
screws up the traversal of caps in trim_caps.
Fixes: http://tracker.ceph.com/issues/22157
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 1439337e949c9fcb7d15eb38c22d19eb57d3d0f2)
</pre>
<p>I think above commit isn't quite right. how about patch below</p>
<pre>
diff --git a/src/client/Client.cc b/src/client/Client.cc
index 5bbe449974..2885a87e48 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -4113,14 +4119,30 @@ void Client::trim_caps(MetaSession *s, uint64_t max)
uint64_t trimmed = 0;
auto p = s->caps.begin();
- std::set<InodeRef> anchor; /* prevent put_inode from deleting all caps during traversal */
- while ((caps_size - trimmed) > max && !p.end()) {
- Cap *cap = *p;
- InodeRef in(cap->inode);
+
+ InodeRef next_in;
+ Cap *next_cap;
+ if (p.end()) {
+ next_cap = nullptr;
+ } else {
+ next_cap = *p;
+ next_in = next_cap->inode;
+ }
+
+ while (next_cap && (caps_size - trimmed) > max) {
+ Cap *cap = next_cap;
+ InodeRef in;
+ in.swap(next_in);
// Increment p early because it will be invalidated if cap
// is deleted inside remove_cap
++p;
+ if (p.end()) {
+ next_cap = nullptr;
+ } else {
+ next_cap = *p;
+ next_in = next_cap->inode;
+ }
if (in->caps.size() > 1 && cap != in->auth_cap) {
int mine = cap->issued | cap->implemented;
@@ -4129,7 +4151,6 @@ void Client::trim_caps(MetaSession *s, uint64_t max)
if (!(get_caps_used(in.get()) & ~oissued & mine)) {
ldout(cct, 20) << " removing unused, unneeded non-auth cap on " << *in << dendl;
cap = (remove_cap(cap, true), nullptr);
- /* N.B. no need to push onto anchor, as we are only removing one cap */
trimmed++;
}
} else {
@@ -4148,8 +4169,6 @@ void Client::trim_caps(MetaSession *s, uint64_t max)
// the end of this function.
_schedule_invalidate_dentry_callback(dn, true);
}
- ldout(cct, 20) << " anchoring inode: " << in->ino << dendl;
- anchor.insert(in);
trim_dentry(dn);
} else {
ldout(cct, 20) << " not expirable: " << dn->name << dendl;
@@ -4162,8 +4181,6 @@ void Client::trim_caps(MetaSession *s, uint64_t max)
}
}
}
- ldout(cct, 20) << " clearing anchored inodes" << dendl;
- anchor.clear();
caps_size = s->caps.size();
if (caps_size > (size_t)max)
</pre> CephFS - Bug #24137: client: segfault in trim_capshttps://tracker.ceph.com/issues/24137?journal_id=1133502018-05-16T18:16:22ZPatrick Donnellypdonnell@redhat.com
<ul></ul><p>Zheng Yan wrote:</p>
<blockquote>
<p>[...]</p>
<p>I think above commit isn't quite right. how about patch below</p>
<p>[...]</p>
</blockquote>
<p>I'm not seeing how that helps the situation. The issue my patch was trying to solve is preventing the next cap (what p points to after ++p) from being invalidated due to trim_dentry -> Client::unlink -> Dentry::put -> Inode::put -> intrusive_ptr_release -> Client::put_inode -> Client::remove_all_caps. This should be prevented by anchoring the Inode references as in the original PR?</p>
<p>We need a test case to definitively show the issue. I'm not willing to push any new fixes for this without a reproducer.</p> CephFS - Bug #24137: client: segfault in trim_capshttps://tracker.ceph.com/issues/24137?journal_id=1133702018-05-17T00:44:01ZZheng Yanukernel@gmail.com
<ul></ul><p>The problem is that anchor only pins current inode. Client::unlink() still may drop reference of its parent inode.</p> CephFS - Bug #24137: client: segfault in trim_capshttps://tracker.ceph.com/issues/24137?journal_id=1133762018-05-17T04:44:02ZPatrick Donnellypdonnell@redhat.com
<ul></ul><p>Zheng Yan wrote:</p>
<blockquote>
<p>The problem is that anchor only pins current inode. Client::unlink() still may drop reference of its parent inode.</p>
</blockquote>
<p>Okay that makes sense I think. We still need a test that demonstrates the problem.</p> CephFS - Bug #24137: client: segfault in trim_capshttps://tracker.ceph.com/issues/24137?journal_id=1134072018-05-17T08:44:26ZZheng Yanukernel@gmail.com
<ul><li><strong>File</strong> <a href="/attachments/download/3465/ceph-client.27063.log">ceph-client.27063.log</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/3465/ceph-client.27063.log">View</a> added</li><li><strong>File</strong> <a href="/attachments/download/3466/test_trim_caps.cc">test_trim_caps.cc</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/3466/test_trim_caps.cc">View</a> added</li></ul><p>compile test_trim_caps.cc with the newest libcephfs. set mds_min_caps_per_client to 1, set mds_max_ratio_caps_per_client to 0. then run test_trim_caps</p>
<pre>
[Current thread is 1 (Thread 0x7fffd97fa700 (LWP 27088))]
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.25-13.fc26.x86_64 libblkid-2.30.2-1.fc26.x86_64 libgcc-7.3.1-2.fc26.x86_64 libibverbs-1.2.1-4.fc26.x86_64 libnl3-3.3.0-1.fc26.x86_64 libstdc++-7.3.1-2.fc26.x86_64 libuuid-2.30.2-1.fc26.x86_64 lttng-ust-2.9.0-2.fc26.x86_64 nspr-4.19.0-1.fc26.x86_64 nss-3.36.1-1.0.fc26.x86_64 nss-softokn-3.36.1-1.0.fc26.x86_64 nss-softokn-freebl-3.36.1-1.0.fc26.x86_64 nss-util-3.36.1-1.0.fc26.x86_64 sqlite-libs-3.20.1-2.fc26.x86_64 userspace-rcu-0.9.3-2.fc26.x86_64 zlib-1.2.11-2.fc26.x86_64
(gdb) bt
#0 0x00007fffd40004d8 in ?? ()
#1 0x00007fffee48bd9b in ceph::logging::log_clock::now (this=0x7fffd40004e8) at /home/zhyan/Ceph/ceph-2/src/log/LogClock.h:95
#2 ceph::logging::Log::create_entry (this=0x7fffd40004b8, level=level@entry=15, subsys=subsys@entry=21, expected_size=expected_size@entry=0x7ffff7dd5ed0 <Inode::get()::_log_exp_length>)
at /home/zhyan/Ceph/ceph-2/src/log/Log.cc:263
#3 0x00007ffff7b82efd in Inode::get (this=0x7fffd4014820) at /home/zhyan/Ceph/ceph-2/src/client/Inode.cc:383
#4 0x00007ffff7aec2ca in intrusive_ptr_add_ref (in=<optimized out>) at /home/zhyan/Ceph/ceph-2/src/client/Client.cc:13974
#5 0x00007ffff7b2188a in boost::intrusive_ptr<Inode>::intrusive_ptr (add_ref=true, p=<optimized out>, this=0x7fffd97f75d8) at /mnt/misc/Ceph/ceph-2/build/boost/include/boost/smart_ptr/intrusive_ptr.hpp:69
#6 Client::trim_caps (this=this@entry=0x77edd0, s=s@entry=0x7958a8, max=1) at /home/zhyan/Ceph/ceph-2/src/client/Client.cc:4124
#7 0x00007ffff7b22652 in Client::handle_client_session (this=this@entry=0x77edd0, m=m@entry=0x7fffe4000f00) at /home/zhyan/Ceph/ceph-2/src/client/Client.cc:2102
#8 0x00007ffff7b5461f in Client::ms_dispatch (this=0x77edd0, m=0x7fffe4000f00) at /home/zhyan/Ceph/ceph-2/src/client/Client.cc:2544
#9 0x00007fffee4e3b4b in Messenger::ms_deliver_dispatch (m=0x7fffe4000f00, this=0x78c750) at /home/zhyan/Ceph/ceph-2/src/msg/Messenger.h:667
#10 DispatchQueue::entry (this=0x78c8d0) at /home/zhyan/Ceph/ceph-2/src/msg/DispatchQueue.cc:201
#11 0x00007fffee5862dd in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /home/zhyan/Ceph/ceph-2/src/msg/DispatchQueue.h:101
#12 0x00007fffec76436d in start_thread () from /lib64/libpthread.so.0
#13 0x00007ffff6f47b4f in clone () from /lib64/libc.so.6
</pre> CephFS - Bug #24137: client: segfault in trim_capshttps://tracker.ceph.com/issues/24137?journal_id=1134862018-05-18T00:14:05ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>Patrick Donnelly</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/22073">https://github.com/ceph/ceph/pull/22073</a></p> CephFS - Bug #24137: client: segfault in trim_capshttps://tracker.ceph.com/issues/24137?journal_id=1135732018-05-19T04:37:51ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Target version</strong> changed from <i>v13.2.0</i> to <i>v14.0.0</i></li><li><strong>Backport</strong> changed from <i>luminous</i> to <i>mimic,luminous</i></li></ul> CephFS - Bug #24137: client: segfault in trim_capshttps://tracker.ceph.com/issues/24137?journal_id=1135742018-05-19T04:38:43ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Pending Backport</i></li></ul> CephFS - Bug #24137: client: segfault in trim_capshttps://tracker.ceph.com/issues/24137?journal_id=1135752018-05-19T04:45:32ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-6 priority-high2 closed" href="/issues/24185">Backport #24185</a>: luminous: client: segfault in trim_caps</i> added</li></ul> CephFS - Bug #24137: client: segfault in trim_capshttps://tracker.ceph.com/issues/24137?journal_id=1135822018-05-19T10:04:12ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/24186">Backport #24186</a>: mimic: client: segfault in trim_caps</i> added</li></ul> CephFS - Bug #24137: client: segfault in trim_capshttps://tracker.ceph.com/issues/24137?journal_id=1154752018-06-20T22:01:02ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul> CephFS - Bug #24137: client: segfault in trim_capshttps://tracker.ceph.com/issues/24137?journal_id=2398372023-05-30T12:59:58ZPatrick Donnellypdonnell@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-4 priority-default closed" href="/issues/61394">Bug #61394</a>: qa/quincy: cluster [WRN] evicting unresponsive client smithi152 (4298), after 303.726 seconds" in cluster log</i> added</li></ul>