Project

General

Profile

Actions

Bug #16120

closed

unittest_throttle BackoffThrottle deadlock

Added by Anonymous almost 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

As per several build issue, we see that the build process remains stuck for hours before being killed.

I was able to spot one live node and collected lots of traces that I add to this ticket.


Files

ps (8.73 KB) ps ps -axwf Anonymous, 06/02/2016 10:02 AM
gdb-p (5.31 KB) gdb-p gdb -p; thread apply all bt Anonymous, 06/02/2016 10:16 AM
Actions #1

Updated by Anonymous almost 8 years ago

Actions #2

Updated by Anonymous almost 8 years ago

[ 1698.168125] unittest_log29267: segfault at 0 ip 00002ae4499613d1 sp 00007fff72b9dd90 error 6 in unittest_log[2ae4498dc000+131000]

Actions #3

Updated by Anonymous almost 8 years ago

Actions #4

Updated by Loïc Dachary almost 8 years ago

Thread 5 (Thread 0x2ab2debde700 (LWP 29391)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00002ab2d2c3a28b in ceph::log::Log::entry (this=0x2ab2d2daa160) at log/Log.cc:459
#2  0x00002ab2dc5e9182 in start_thread (arg=0x2ab2debde700) at pthread_create.c:312
#3  0x00002ab2ddcb54ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 4 (Thread 0x2ab2df559700 (LWP 29392)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00002ab2d2c544fa in WaitUntil (when=..., mutex=..., this=0x2ab2d2df9060) at ./common/Cond.h:72
#2  WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x2ab2d2df9060) at ./common/Cond.h:81
#3  CephContextServiceThread::entry (this=0x2ab2d2df8fe0) at common/ceph_context.cc:99
#4  0x00002ab2dc5e9182 in start_thread (arg=0x2ab2df559700) at pthread_create.c:312
#5  0x00002ab2ddcb54ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 3 (Thread 0x2ab2df75a700 (LWP 29550)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00002ab2d2c3d0b4 in Wait (mutex=..., this=0x2ab2e80008c0) at ./common/Cond.h:56
#2  Throttle::_wait (this=this@entry=0x2ab2d2dfa9c0, c=c@entry=7) at common/Throttle.cc:107
#3  0x00002ab2d2c3dd17 in Throttle::get (this=0x2ab2d2dfa9c0, c=7, m=0) at common/Throttle.cc:177
#4  0x00002ab2d2be985d in ThrottleTest::Thread_get::entry (this=0x2ab2d2df9a10) at test/common/Throttle.cc:56
#5  0x00002ab2dc5e9182 in start_thread (arg=0x2ab2df75a700) at pthread_create.c:312
#6  0x00002ab2ddcb54ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x2ab2eca04700 (LWP 29559)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00002ab2dd4494dc in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00002ab2d2be313c in operator() (__closure=0x2ab2d2dfb0d0) at test/common/Throttle.cc:323
#3  _M_invoke<> (this=0x2ab2d2dfb0d0) at /usr/include/c++/4.8/functional:1732
#4  operator() (this=0x2ab2d2dfb0d0) at /usr/include/c++/4.8/functional:1720
#5  std::thread::_Impl<std::_Bind_simple<test_backoff(double, double, double, double, double, uint64_t, double, unsigned int, unsigned int)::__lambda1()> >::_M_run(void) (this=0x2ab2d2dfb0b8)
    at /usr/include/c++/4.8/thread:115
#6  0x00002ab2dd44ca60 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00002ab2dc5e9182 in start_thread (arg=0x2ab2eca04700) at pthread_create.c:312
#8  0x00002ab2ddcb54ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x2ab2d2da0f80 (LWP 29345)):
#0  0x00002ab2dc5ea66b in pthread_join (threadid=46947962472192, thread_return=0x0) at pthread_join.c:92
#1  0x00002ab2dd44c857 in std::thread::join() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00002ab2d2be3708 in test_backoff (low_threshhold=<optimized out>, high_threshhold=<optimized out>, expected_throughput=<optimized out>, high_multiple=<optimized out>, max_multiple=<optimized out>, 
    max=100, put_delay_per_count=0.0001, getters=<optimized out>, putters=6) at test/common/Throttle.cc:355
#3  0x00002ab2d2be46dd in BackoffThrottle_undersaturated_Test::TestBody (this=<optimized out>) at test/common/Throttle.cc:374
#4  0x00002ab2d2c121a4 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x2ab2d2df9360, method=&virtual testing::Test::TestBody(), 
    location=0x2ab2d2cfa51b "the test body") at ./src/gtest.cc:2078
#5  0x00002ab2d2c0d2ae in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x2ab2d2df9360, method=&virtual testing::Test::TestBody(), location=0x2ab2d2cfa51b "the test body")
    at ./src/gtest.cc:2114
#6  0x00002ab2d2bf3f73 in testing::Test::Run (this=0x2ab2d2df9360) at ./src/gtest.cc:2151
#7  0x00002ab2d2bf476c in testing::TestInfo::Run (this=0x2ab2d2da4e00) at ./src/gtest.cc:2326
#8  0x00002ab2d2bf4e30 in testing::TestCase::Run (this=0x2ab2d2da4ef0) at ./src/gtest.cc:2444
#9  0x00002ab2d2bfb764 in testing::internal::UnitTestImpl::RunAllTests (this=0x2ab2d2da4210) at ./src/gtest.cc:4315
#10 0x00002ab2d2c1358a in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x2ab2d2da4210, 
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x2ab2d2bfb4cc <testing::internal::UnitTestImpl::RunAllTests()>, 
    location=0x2ab2d2cfad70 "auxiliary test code (environments or event listeners)") at ./src/gtest.cc:2078
#11 0x00002ab2d2c0e108 in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x2ab2d2da4210, 
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x2ab2d2bfb4cc <testing::internal::UnitTestImpl::RunAllTests()>, 
    location=0x2ab2d2cfad70 "auxiliary test code (environments or event listeners)") at ./src/gtest.cc:2114
#12 0x00002ab2d2bfa38a in testing::UnitTest::Run (this=0x2ab2d2fa0ec0 <testing::UnitTest::GetInstance()::instance>) at ./src/gtest.cc:3929
#13 0x00002ab2d2be131c in RUN_ALL_TESTS () at ../src/gmock/gtest/include/gtest/gtest.h:2288
#14 main (argc=1, argv=0x7ffc7da8cbf8) at test/common/Throttle.cc:425

Actions #5

Updated by Loïc Dachary almost 8 years ago

  • Subject changed from unittest_throttle seems to lock the build process to unittest_throttle BackoffThrottle deadlock
  • Status changed from New to 12
  • Priority changed from Normal to High
Actions #6

Updated by Anonymous almost 8 years ago

After a long while, we still have the same backtrace.

Actions #7

Updated by Loïc Dachary almost 8 years ago

while : ; do echo -n . ; timeout 60 ./unittest_throttle > /dev/null 2>&1 || break ; done

eventually reproduces the problem, if the load of the machine is high. It does not seem to happen if the machine is fast.
Actions #8

Updated by Loïc Dachary almost 8 years ago

  • Status changed from 12 to Fix Under Review
  • Assignee set to Loïc Dachary
Actions #9

Updated by Anonymous almost 8 years ago

The involved code is realted to samuel just commit 9f4cba31d7fc520aeec6aef1abe398080363be08

Actions #10

Updated by Loïc Dachary almost 8 years ago

The test also hangs somewhere else. In this context the hang happens when the test joins the getters

  for (auto &&i: gts) i.join();
Thread 5 (Thread 0x7f5715ffb700 (LWP 28199)):
#0  0x00007f57291196d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f57288909ec in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2  0x00007f572a04024f in BackoffThrottle::get (this=0x7fff89df19f0, c=c@entry=3) at common/Throttle.cc:381
#3  0x00007f5729fe3e81 in __lambda0::operator() (__closure=0x7f5733796050) at test/common/Throttle.cc:308
#4  0x00007f5728894220 in ?? () from /lib64/libstdc++.so.6
#5  0x00007f5729115dc5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f5727ffbced in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f5725533700 (LWP 28187)):
#0  0x00007f57291196d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f572a03d0f4 in Wait (mutex=..., this=0x7f571c0008c0) at ./common/Cond.h:56
#2  Throttle::_wait (this=this@entry=0x7f5733796020, c=c@entry=7) at common/Throttle.cc:107
#3  0x00007f572a03dd57 in Throttle::get (this=0x7f5733796020, c=7, m=m@entry=0) at common/Throttle.cc:177
#4  0x00007f5729fe96cd in ThrottleTest::Thread_get::entry (this=0x7f5733795db0) at test/common/Throttle.cc:56
#5  0x00007f5729115dc5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f5727ffbced in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f5726535700 (LWP 28166)):
#0  0x00007f5729119a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f572a05421a in WaitUntil (when=..., mutex=..., this=0x7f5733794e10) at ./common/Cond.h:72
#2  WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x7f5733794e10) at ./common/Cond.h:81
#3  CephContextServiceThread::entry (this=0x7f5733794d90) at common/ceph_context.cc:99
#4  0x00007f5729115dc5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f5727ffbced in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f57274a9700 (LWP 28165)):
#0  0x00007f57291196d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f572a03a2cb in ceph::log::Log::entry (this=0x7f5733745080) at log/Log.cc:459
#2  0x00007f5729115dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f5727ffbced in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f5729eb4d00 (LWP 28164)):
#0  0x00007f572911cfcb in raise () from /lib64/libpthread.so.0
#1  0x00007f572a021585 in reraise_fatal (signum=11) at global/signal_handler.cc:69
#2  handle_fatal_signal (signum=11) at global/signal_handler.cc:131
#3  <signal handler called>
#4  0x00007f5729116ef7 in pthread_join () from /lib64/libpthread.so.0
#5  0x00007f5728893fe7 in std::thread::join() () from /lib64/libstdc++.so.6
#6  0x00007f5729fe3518 in test_backoff (low_threshhold=low_threshhold@entry=0.40000000000000002, high_threshhold=high_threshhold@entry=0.59999999999999998, expected_throughput=expected_throughput@entry=1000, high_multiple=high_multiple@entry=2, max_multiple=max_multiple@entry=10, max=max@entry=100, put_delay_per_count=put_delay_per_count@entry=0.001, getters=getters@entry=7, putters=putters@entry=2) at test/common/Throttle.cc:354
#7  0x00007f5729fe4a7d in BackoffThrottle_balanced_Test::TestBody (this=<optimized out>) at test/common/Throttle.cc:394
#8  0x00007f572a012252 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x7f5733748a80, method=&virtual testing::Test::TestBody(), location=0x7f572a0fa2bb "the test body") at ./src/gtest.cc:2078
#9  0x00007f572a00d34c in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x7f5733748a80, method=&virtual testing::Test::TestBody(), location=0x7f572a0fa2bb "the test body") at ./src/gtest.cc:2114
#10 0x00007f5729ff3e3f in testing::Test::Run (this=0x7f5733748a80) at ./src/gtest.cc:2151
#11 0x00007f5729ff4638 in testing::TestInfo::Run (this=0x7f5733740030) at ./src/gtest.cc:2326
#12 0x00007f5729ff4cfc in testing::TestCase::Run (this=0x7f573373fe80) at ./src/gtest.cc:2444
#13 0x00007f5729ffb7c4 in testing::internal::UnitTestImpl::RunAllTests (this=0x7f573373f210) at ./src/gtest.cc:4315
#14 0x00007f572a013638 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x7f573373f210, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x7f5729ffb52c <testing::internal::UnitTestImpl::RunAllTests()>, location=0x7f572a0fab10 "auxiliary test code (environments or event listeners)") at ./src/gtest.cc:2078
#15 0x00007f572a00e1ae in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x7f573373f210, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x7f5729ffb52c <testing::internal::UnitTestImpl::RunAllTests()>, location=0x7f572a0fab10 "auxiliary test code (environments or event listeners)") at ./src/gtest.cc:2114
#16 0x00007f5729ffa394 in testing::UnitTest::Run (this=0x7f572a3a0da0 <testing::UnitTest::GetInstance()::instance>) at ./src/gtest.cc:3929
#17 0x00007f5729fe111c in RUN_ALL_TESTS () at ../src/gmock/gtest/include/gtest/gtest.h:2288
#18 main (argc=1, argv=0x7fff89df1fb8) at test/common/Throttle.cc:427
Actions #11

Updated by Loïc Dachary almost 8 years ago

The bug can be repeated by running

ulimit -c unlimited ; while : ; do echo -n . ; timeout -s SEGV 300 ./unittest_throttle > /dev/null 2>&1 || break ; done

on a slow machine and/or a machine with a load that varies a lot (not sure what's best). It may take a few hours before it happens.

Actions #13

Updated by Loïc Dachary over 7 years ago

  • Assignee deleted (Loïc Dachary)
Actions #14

Updated by Samuel Just over 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF