Actions
Bug #16120
closedunittest_throttle BackoffThrottle deadlock
Added by Anonymous almost 8 years ago. Updated over 7 years ago.
Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
As per several build issue, we see that the build process remains stuck for hours before being killed.
I was able to spot one live node and collected lots of traces that I add to this ticket.
Files
Updated by Anonymous almost 8 years ago
[ 1698.168125] unittest_log29267: segfault at 0 ip 00002ae4499613d1 sp 00007fff72b9dd90 error 6 in unittest_log[2ae4498dc000+131000]
Updated by Loïc Dachary almost 8 years ago
Thread 5 (Thread 0x2ab2debde700 (LWP 29391)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00002ab2d2c3a28b in ceph::log::Log::entry (this=0x2ab2d2daa160) at log/Log.cc:459 #2 0x00002ab2dc5e9182 in start_thread (arg=0x2ab2debde700) at pthread_create.c:312 #3 0x00002ab2ddcb54ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 4 (Thread 0x2ab2df559700 (LWP 29392)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00002ab2d2c544fa in WaitUntil (when=..., mutex=..., this=0x2ab2d2df9060) at ./common/Cond.h:72 #2 WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x2ab2d2df9060) at ./common/Cond.h:81 #3 CephContextServiceThread::entry (this=0x2ab2d2df8fe0) at common/ceph_context.cc:99 #4 0x00002ab2dc5e9182 in start_thread (arg=0x2ab2df559700) at pthread_create.c:312 #5 0x00002ab2ddcb54ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 3 (Thread 0x2ab2df75a700 (LWP 29550)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00002ab2d2c3d0b4 in Wait (mutex=..., this=0x2ab2e80008c0) at ./common/Cond.h:56 #2 Throttle::_wait (this=this@entry=0x2ab2d2dfa9c0, c=c@entry=7) at common/Throttle.cc:107 #3 0x00002ab2d2c3dd17 in Throttle::get (this=0x2ab2d2dfa9c0, c=7, m=0) at common/Throttle.cc:177 #4 0x00002ab2d2be985d in ThrottleTest::Thread_get::entry (this=0x2ab2d2df9a10) at test/common/Throttle.cc:56 #5 0x00002ab2dc5e9182 in start_thread (arg=0x2ab2df75a700) at pthread_create.c:312 #6 0x00002ab2ddcb54ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 2 (Thread 0x2ab2eca04700 (LWP 29559)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00002ab2dd4494dc in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #2 0x00002ab2d2be313c in operator() (__closure=0x2ab2d2dfb0d0) at test/common/Throttle.cc:323 #3 _M_invoke<> (this=0x2ab2d2dfb0d0) at /usr/include/c++/4.8/functional:1732 #4 operator() (this=0x2ab2d2dfb0d0) at /usr/include/c++/4.8/functional:1720 #5 std::thread::_Impl<std::_Bind_simple<test_backoff(double, double, double, double, double, uint64_t, double, unsigned int, unsigned int)::__lambda1()> >::_M_run(void) (this=0x2ab2d2dfb0b8) at /usr/include/c++/4.8/thread:115 #6 0x00002ab2dd44ca60 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #7 0x00002ab2dc5e9182 in start_thread (arg=0x2ab2eca04700) at pthread_create.c:312 #8 0x00002ab2ddcb54ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 1 (Thread 0x2ab2d2da0f80 (LWP 29345)): #0 0x00002ab2dc5ea66b in pthread_join (threadid=46947962472192, thread_return=0x0) at pthread_join.c:92 #1 0x00002ab2dd44c857 in std::thread::join() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #2 0x00002ab2d2be3708 in test_backoff (low_threshhold=<optimized out>, high_threshhold=<optimized out>, expected_throughput=<optimized out>, high_multiple=<optimized out>, max_multiple=<optimized out>, max=100, put_delay_per_count=0.0001, getters=<optimized out>, putters=6) at test/common/Throttle.cc:355 #3 0x00002ab2d2be46dd in BackoffThrottle_undersaturated_Test::TestBody (this=<optimized out>) at test/common/Throttle.cc:374 #4 0x00002ab2d2c121a4 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x2ab2d2df9360, method=&virtual testing::Test::TestBody(), location=0x2ab2d2cfa51b "the test body") at ./src/gtest.cc:2078 #5 0x00002ab2d2c0d2ae in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x2ab2d2df9360, method=&virtual testing::Test::TestBody(), location=0x2ab2d2cfa51b "the test body") at ./src/gtest.cc:2114 #6 0x00002ab2d2bf3f73 in testing::Test::Run (this=0x2ab2d2df9360) at ./src/gtest.cc:2151 #7 0x00002ab2d2bf476c in testing::TestInfo::Run (this=0x2ab2d2da4e00) at ./src/gtest.cc:2326 #8 0x00002ab2d2bf4e30 in testing::TestCase::Run (this=0x2ab2d2da4ef0) at ./src/gtest.cc:2444 #9 0x00002ab2d2bfb764 in testing::internal::UnitTestImpl::RunAllTests (this=0x2ab2d2da4210) at ./src/gtest.cc:4315 #10 0x00002ab2d2c1358a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x2ab2d2da4210, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x2ab2d2bfb4cc <testing::internal::UnitTestImpl::RunAllTests()>, location=0x2ab2d2cfad70 "auxiliary test code (environments or event listeners)") at ./src/gtest.cc:2078 #11 0x00002ab2d2c0e108 in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x2ab2d2da4210, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x2ab2d2bfb4cc <testing::internal::UnitTestImpl::RunAllTests()>, location=0x2ab2d2cfad70 "auxiliary test code (environments or event listeners)") at ./src/gtest.cc:2114 #12 0x00002ab2d2bfa38a in testing::UnitTest::Run (this=0x2ab2d2fa0ec0 <testing::UnitTest::GetInstance()::instance>) at ./src/gtest.cc:3929 #13 0x00002ab2d2be131c in RUN_ALL_TESTS () at ../src/gmock/gtest/include/gtest/gtest.h:2288 #14 main (argc=1, argv=0x7ffc7da8cbf8) at test/common/Throttle.cc:425
Updated by Loïc Dachary almost 8 years ago
- Subject changed from unittest_throttle seems to lock the build process to unittest_throttle BackoffThrottle deadlock
- Status changed from New to 12
- Priority changed from Normal to High
Updated by Anonymous almost 8 years ago
After a long while, we still have the same backtrace.
Updated by Loïc Dachary almost 8 years ago
while : ; do echo -n . ; timeout 60 ./unittest_throttle > /dev/null 2>&1 || break ; done
eventually reproduces the problem, if the load of the machine is high. It does not seem to happen if the machine is fast.
Updated by Loïc Dachary almost 8 years ago
- Status changed from 12 to Fix Under Review
- Assignee set to Loïc Dachary
Updated by Anonymous almost 8 years ago
The involved code is realted to samuel just commit 9f4cba31d7fc520aeec6aef1abe398080363be08
Updated by Loïc Dachary almost 8 years ago
The test also hangs somewhere else. In this context the hang happens when the test joins the getters
for (auto &&i: gts) i.join();
Thread 5 (Thread 0x7f5715ffb700 (LWP 28199)): #0 0x00007f57291196d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f57288909ec in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6 #2 0x00007f572a04024f in BackoffThrottle::get (this=0x7fff89df19f0, c=c@entry=3) at common/Throttle.cc:381 #3 0x00007f5729fe3e81 in __lambda0::operator() (__closure=0x7f5733796050) at test/common/Throttle.cc:308 #4 0x00007f5728894220 in ?? () from /lib64/libstdc++.so.6 #5 0x00007f5729115dc5 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f5727ffbced in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f5725533700 (LWP 28187)): #0 0x00007f57291196d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f572a03d0f4 in Wait (mutex=..., this=0x7f571c0008c0) at ./common/Cond.h:56 #2 Throttle::_wait (this=this@entry=0x7f5733796020, c=c@entry=7) at common/Throttle.cc:107 #3 0x00007f572a03dd57 in Throttle::get (this=0x7f5733796020, c=7, m=m@entry=0) at common/Throttle.cc:177 #4 0x00007f5729fe96cd in ThrottleTest::Thread_get::entry (this=0x7f5733795db0) at test/common/Throttle.cc:56 #5 0x00007f5729115dc5 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f5727ffbced in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f5726535700 (LWP 28166)): #0 0x00007f5729119a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f572a05421a in WaitUntil (when=..., mutex=..., this=0x7f5733794e10) at ./common/Cond.h:72 #2 WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x7f5733794e10) at ./common/Cond.h:81 #3 CephContextServiceThread::entry (this=0x7f5733794d90) at common/ceph_context.cc:99 #4 0x00007f5729115dc5 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f5727ffbced in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f57274a9700 (LWP 28165)): #0 0x00007f57291196d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f572a03a2cb in ceph::log::Log::entry (this=0x7f5733745080) at log/Log.cc:459 #2 0x00007f5729115dc5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f5727ffbced in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f5729eb4d00 (LWP 28164)): #0 0x00007f572911cfcb in raise () from /lib64/libpthread.so.0 #1 0x00007f572a021585 in reraise_fatal (signum=11) at global/signal_handler.cc:69 #2 handle_fatal_signal (signum=11) at global/signal_handler.cc:131 #3 <signal handler called> #4 0x00007f5729116ef7 in pthread_join () from /lib64/libpthread.so.0 #5 0x00007f5728893fe7 in std::thread::join() () from /lib64/libstdc++.so.6 #6 0x00007f5729fe3518 in test_backoff (low_threshhold=low_threshhold@entry=0.40000000000000002, high_threshhold=high_threshhold@entry=0.59999999999999998, expected_throughput=expected_throughput@entry=1000, high_multiple=high_multiple@entry=2, max_multiple=max_multiple@entry=10, max=max@entry=100, put_delay_per_count=put_delay_per_count@entry=0.001, getters=getters@entry=7, putters=putters@entry=2) at test/common/Throttle.cc:354 #7 0x00007f5729fe4a7d in BackoffThrottle_balanced_Test::TestBody (this=<optimized out>) at test/common/Throttle.cc:394 #8 0x00007f572a012252 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x7f5733748a80, method=&virtual testing::Test::TestBody(), location=0x7f572a0fa2bb "the test body") at ./src/gtest.cc:2078 #9 0x00007f572a00d34c in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x7f5733748a80, method=&virtual testing::Test::TestBody(), location=0x7f572a0fa2bb "the test body") at ./src/gtest.cc:2114 #10 0x00007f5729ff3e3f in testing::Test::Run (this=0x7f5733748a80) at ./src/gtest.cc:2151 #11 0x00007f5729ff4638 in testing::TestInfo::Run (this=0x7f5733740030) at ./src/gtest.cc:2326 #12 0x00007f5729ff4cfc in testing::TestCase::Run (this=0x7f573373fe80) at ./src/gtest.cc:2444 #13 0x00007f5729ffb7c4 in testing::internal::UnitTestImpl::RunAllTests (this=0x7f573373f210) at ./src/gtest.cc:4315 #14 0x00007f572a013638 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x7f573373f210, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x7f5729ffb52c <testing::internal::UnitTestImpl::RunAllTests()>, location=0x7f572a0fab10 "auxiliary test code (environments or event listeners)") at ./src/gtest.cc:2078 #15 0x00007f572a00e1ae in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x7f573373f210, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x7f5729ffb52c <testing::internal::UnitTestImpl::RunAllTests()>, location=0x7f572a0fab10 "auxiliary test code (environments or event listeners)") at ./src/gtest.cc:2114 #16 0x00007f5729ffa394 in testing::UnitTest::Run (this=0x7f572a3a0da0 <testing::UnitTest::GetInstance()::instance>) at ./src/gtest.cc:3929 #17 0x00007f5729fe111c in RUN_ALL_TESTS () at ../src/gmock/gtest/include/gtest/gtest.h:2288 #18 main (argc=1, argv=0x7fff89df1fb8) at test/common/Throttle.cc:427
Updated by Loïc Dachary almost 8 years ago
The bug can be repeated by running
ulimit -c unlimited ; while : ; do echo -n . ; timeout -s SEGV 300 ./unittest_throttle > /dev/null 2>&1 || break ; done
on a slow machine and/or a machine with a load that varies a lot (not sure what's best). It may take a few hours before it happens.
Updated by Loïc Dachary over 7 years ago
Updated by Samuel Just over 7 years ago
- Status changed from Fix Under Review to Resolved
Actions