https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2017-05-04T13:59:33ZCeph rgw - Bug #19831: rgw: segfault during the shutdown procedurehttps://tracker.ceph.com/issues/19831?journal_id=907462017-05-04T13:59:33ZRadoslaw Zarzynskirzarzyns@redhat.com
<ul></ul><pre>
(gdb) handle SIGINT nostop pass
SIGINT is used by the debugger.
Are you sure you want to change it? (y or n) y
Signal Stop Print Pass to program Description
SIGINT No Yes Yes Interrupt
(gdb) cont
Continuing.
...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f2240ba0700 (LWP 16210)]
__GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
66 ../nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) bt
#0 __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
#1 0x00007f226840b519 in PR_Lock () from /usr/lib/x86_64-linux-gnu/libnspr4.so
#2 0x00007f2268410f04 in ?? () from /usr/lib/x86_64-linux-gnu/libnspr4.so
#3 0x00007f2268410fc3 in ?? () from /usr/lib/x86_64-linux-gnu/libnspr4.so
#4 0x00007f2268a83f82 in __nptl_deallocate_tsd () at pthread_create.c:158
#5 0x00007f2268a84195 in start_thread (arg=0x7f2240ba0700) at pthread_create.c:325
#6 0x00007f226739b47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
</pre> rgw - Bug #19831: rgw: segfault during the shutdown procedurehttps://tracker.ceph.com/issues/19831?journal_id=908282017-05-05T14:24:03ZRadoslaw Zarzynskirzarzyns@redhat.com
<ul></ul><p>After pulling the debug symbols for libnspr4:<br /><pre>
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f407461b700 (LWP 12544)]
__GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
66 ../nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) bt full
#0 __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
__PRETTY_FUNCTION__ = "__pthread_mutex_lock"
type = 0
#1 0x00007f409be864d9 in PR_Lock (lock=0x0) at ptsynch.c:177
No locals.
#2 0x00007f409be8bf04 in _pt_thread_death_internal (arg=0x7f40a7a94880, callDestructors=1) at ptthread.c:880
thred = 0x7f40a7a94880
#3 0x00007f409be8bfc3 in _pt_thread_death (arg=0x7f40a7a94880) at ptthread.c:865
thred = <optimized out>
#4 0x00007f409c4fef82 in __nptl_deallocate_tsd () at pthread_create.c:158
data = 0x0
idx = 0
cnt = 139914807196248
#5 0x00007f409c4ff195 in start_thread (arg=0x7f407461b700) at pthread_create.c:325
pd = 0x7f407461b700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139914807195392, -1974657238859091938, 0, 0, 139914807196096, 139914807195392,
1880379800789872670, 1880713728613969950}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0,
cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
__PRETTY_FUNCTION__ = "start_thread"
#6 0x00007f409ae1647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
No locals.
</pre></p>
<p><strong>_pt_thread_death_internal</strong> of NSPR <a href="https://dxr.mozilla.org/mozilla-beta/source/nsprpub/pr/src/pthreads/ptthread.c#880" class="external">calls</a> <strong>PR_Lock</strong> with <strong>pt_book.ml</strong> that is <strong>NULL</strong> at the moment. The member is <a href="https://dxr.mozilla.org/mozilla-beta/source/nsprpub/pr/src/pthreads/ptthread.c#1133-1143" class="external">being cleaned</a> in <strong>PR_Cleanup</strong>:</p>
<pre><code class="cpp syntaxhl"><span class="CodeRay"> <span class="comment">/*
* I am not sure if it's safe to delete the cv and lock here,
* since there may still be "system" threads around. If this
* call isn't immediately prior to exiting, then there's a
* problem.
*/</span>
<span class="keyword">if</span> (<span class="integer">0</span> == pt_book.system)
{
PR_DestroyCondVar(pt_book.cv); pt_book.cv = <span class="predefined-constant">NULL</span>;
PR_DestroyLock(pt_book.ml); pt_book.ml = <span class="predefined-constant">NULL</span>;
}
</span></code></pre>
<p>If that's the case, the <strong>PR_Cleanup</strong> is called before stopping the Keystone revocation thread. Let's verify that:</p>
<pre>
(gdb) break PR_Cleanup
Breakpoint 1 at 0x7fd7e22c9960: file ptthread.c, line 1101.
(gdb) cont
Continuing.
...
Breakpoint 1, PR_Cleanup () at ptthread.c:1101
1101 ptthread.c: No such file or directory.
(gdb) bt
#0 PR_Cleanup () at ptthread.c:1101
#1 0x00007fd7e31dad75 in ceph::crypto::shutdown (shared=<optimized out>) at /work/ceph-4/src/common/ceph_crypto.cc:88
#2 0x00007fd7e3187de5 in CephContext::~CephContext (this=0x7fd7edc28000, __in_chrg=<optimized out>) at /work/ceph-4/src/common/ceph_context.cc:663
#3 0x00007fd7e3187e1a in CephContext::put (this=0x7fd7edc28000) at /work/ceph-4/src/common/ceph_context.cc:670
#4 0x00007fd7ec6295e5 in ~intrusive_ptr (this=0x7ffc64219b50, __in_chrg=<optimized out>)
at /work/ceph-4/build/boost/include/boost/smart_ptr/intrusive_ptr.hpp:97
#5 main (argc=<optimized out>, argv=<optimized out>) at /work/ceph-4/src/rgw/rgw_main.cc:274
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fd7baa58700 (LWP 14013)]
__GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
66 ../nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) bt
#0 __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
#1 0x00007fd7e22c34d9 in PR_Lock (lock=0x0) at ptsynch.c:177
#2 0x00007fd7e22c8f04 in _pt_thread_death_internal (arg=0x7fd7edc86640, callDestructors=1) at ptthread.c:880
#3 0x00007fd7e22c8fc3 in _pt_thread_death (arg=0x7fd7edc86640) at ptthread.c:865
#4 0x00007fd7e293bf82 in __nptl_deallocate_tsd () at pthread_create.c:158
#5 0x00007fd7e293c195 in start_thread (arg=0x7fd7baa58700) at pthread_create.c:325
#6 0x00007fd7e125347d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
</pre> rgw - Bug #19831: rgw: segfault during the shutdown procedurehttps://tracker.ceph.com/issues/19831?journal_id=908292017-05-05T14:38:10ZRadoslaw Zarzynskirzarzyns@redhat.com
<ul><li><strong>Regression</strong> changed from <i>No</i> to <i>Yes</i></li></ul><p>The bug is a regression. It has been introduced in: b22977dfd27967def1b6d4caf83694a2264fc825 (<a href="https://github.com/ceph/ceph/pull/14801" class="external">PR 14801</a>). After reverting the commit the segfault is gone.</p> rgw - Bug #19831: rgw: segfault during the shutdown procedurehttps://tracker.ceph.com/issues/19831?journal_id=909302017-05-10T13:04:06ZRadoslaw Zarzynskirzarzyns@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Fix Under Review</i></li></ul><p>PR: <a class="external" href="https://github.com/ceph/ceph/pull/15033">https://github.com/ceph/ceph/pull/15033</a>.</p> rgw - Bug #19831: rgw: segfault during the shutdown procedurehttps://tracker.ceph.com/issues/19831?journal_id=1603412020-03-05T15:03:25ZDaniel Gryniewiczdang@redhat.com
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Resolved</i></li></ul>