Bug #43861
ceph_test_rados_watch_notify hang
0%
Description
[root@smithi130 tmp]# cat ceph_test_rados_watch_notify.log process_1_[25350]: starting. process_1_[25350]: creating pool foo.25320 process_1_[25350]: created object 0... process_1_[25350]: finishing. process_1_[25350]: shutting down. process_3_[25352]: starting. process_3_[25352]: notifying object 0.obj process_3_[25352]: shutting down. process_4_[25353]: starting. process_4_[25353]: deleting pool foo.25320 process_4_[25353]: shutting down. process_2_[25351]: starting. process_2_[25351]: watching object 0.obj process_2_[25351]: shutting down. ******************************* process_5_[26143]: starting. process_5_[26143]: creating pool foo.25320. process_5_[26143]: finishing. process_5_[26143]: shutting down. ******************************* process_7_[26145]: starting. process_7_[26145]: notifying object 0.obj process_7_[26145]: shutting down. ******************************* process_6_[26144]: starting. process_6_[26144]: watching object 0.obj process_6_[26144]: shutting down. ******************************* process_8_[26146]: starting. process_8_[26146]: deleting pool foo.25320. process_8_[26146]: shutting down. ******************************* ******************************* process_9_[26236]: starting. process_9_[26236]: creating pool foo.25320.. process_9_[26236]: created object 0... process_9_[26236]: finishing. process_9_[26236]: shutting down. ******************************* ******************************* process_11_[26238]: starting. process_11_[26238]: notifying object 0.obj process_11_[26238]: file /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/huge/release/15.0.0-9869-g4b944a6/rpm/el8/BUILD/ceph-15.0.0-9869-g4b944a6/src/test/system/st_rados_notify.cc, line 69: expected 0, got -110 process_11_[26238]: shutting down.
that process is the zombie
25320 ? S 0:00 | \_ ceph_test_rados_watch_notify 26237 ? Sl 0:02 | | \_ ceph_test_rados_watch_notify 26238 ? Z 0:00 | | \_ [ceph_test_rados] <defunct> 26239 ? Sl 0:00 | | \_ ceph_test_rados_watch_notify 26240 ? S 0:00 | | \_ ceph_test_rados_watch_notify 26241 ? S 0:00 | | \_ ceph_test_rados_watch_notify 26242 ? Sl 0:00 | | \_ ceph_test_rados_watch_notify
Related issues
History
#1 Updated by Sage Weil over 3 years ago
/a/sage-2020-01-28_03:52:05-rados-wip-sage2-testing-2020-01-27-1839-distro-basic-smithi/4713217
#2 Updated by Sage Weil over 3 years ago
same?
/a/sage-2020-02-01_03:27:35-rados-wip-sage-testing-2020-01-31-1746-distro-basic-smithi/4723146
ceph_test_watch_notify didn't generate any output.
#3 Updated by Brad Hubbard over 3 years ago
- Related to Bug #44062: LibRadosWatchNotify.WatchNotify failure added
#4 Updated by Brad Hubbard over 3 years ago
Almost certainly the same issue as #44062
#5 Updated by Sage Weil over 3 years ago
2020-02-23T19:20:15.553 INFO:tasks.workunit.client.0.smithi073.stdout: watch_notify: process_1_[36775]: creating pool foo.36733 2020-02-23T19:20:15.553 INFO:tasks.workunit.client.0.smithi073.stdout: watch_notify: process_1_[36775]: created object 0... 2020-02-23T19:20:15.553 INFO:tasks.workunit.client.0.smithi073.stdout: watch_notify: process_1_[36775]: finishing. 2020-02-23T19:20:15.553 INFO:tasks.workunit.client.0.smithi073.stdout: watch_notify: process_1_[36775]: shutting down. 2020-02-23T19:20:15.554 INFO:tasks.workunit.client.0.smithi073.stdout: watch_notify: process_3_[36777]: starting. 2020-02-23T19:20:15.554 INFO:tasks.workunit.client.0.smithi073.stdout: watch_notify: process_3_[36777]: notifying object 0.obj 2020-02-23T19:20:15.562 INFO:tasks.workunit.client.0.smithi073.stdout: watch_notify: process_3_[36777]: shutting down. 2020-02-23T19:20:15.562 INFO:tasks.workunit.client.0.smithi073.stdout: watch_notify: process_2_[36776]: starting. 2020-02-23T19:20:15.563 INFO:tasks.workunit.client.0.smithi073.stdout: watch_notify: process_2_[36776]: watching object 0.obj 2020-02-23T19:20:15.563 INFO:tasks.workunit.client.0.smithi073.stdout: watch_notify: Received fewer notifies than expected: 0 < 1 2020-02-23T19:20:15.563 INFO:tasks.workunit.client.0.smithi073.stdout: watch_notify: process_2_[36776]: shutting down. 2020-02-23T19:20:15.563 INFO:tasks.workunit.client.0.smithi073.stdout: watch_notify: test1: got error: run_until_finished: runnable process_2: got error: [36733] returned exit_status (1) Operation not permitted
/a/sage-2020-02-23_16:41:52-rados-wip-sage2-testing-2020-02-23-0755-distro-basic-smithi/4793920
#6 Updated by Sage Weil over 3 years ago
no output at all, like comment 2 above:
/a/sage-2020-03-06_17:29:42-rados-wip-sage4-testing-2020-03-05-1645-distro-basic-smithi/4831763
#7 Updated by Sage Weil over 3 years ago
41404 ? S 0:00 | | \_ bash -o pipefail -exc ceph_test_rados_watch_notify 2>&1 | tee ceph_test_rados_watch_notify.log | sed "s/^/ watch_notify: /" 41437 ? S 0:00 | | \_ ceph_test_rados_watch_notify 41470 ? Sl 0:01 | | | \_ ceph_test_rados_watch_notify 41471 ? Z 0:00 | | | \_ [ceph_test_rados] <defunct> 41472 ? Sl 0:00 | | | \_ ceph_test_rados_watch_notify 41438 ? S 0:00 | | \_ tee ceph_test_rados_watch_notify.log 41439 ? S 0:00 | | \_ sed s/^/ watch_notify: / ... [root@smithi104 tmp]# gdb ceph_test_rados_watch_notify 41437 ... Thread 1 (Thread 0x7f1b496d6640 (LWP 41437)): #0 0x00007f1b48f4fb52 in read () from /lib64/libpthread.so.0 #1 0x00007f1b401f90ef in read (__nbytes=4, __buf=0x7ffca63a5068, __fd=4) at /usr/include/bits/unistd.h:44 #2 safe_read (fd=4, buf=buf@entry=0x7ffca63a5068, count=count@entry=4) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/safe_io.c:30 #3 0x00007f1b401f914d in safe_read_exact (fd=<optimized out>, buf=buf@entry=0x7ffca63a5068, count=count@entry=4) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/safe_io.c:48 #4 0x00005599eb02ceda in Preforker::parent_wait (this=this@entry=0x7ffca63a5e08, err_msg="") at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Preforker.h:82 #5 0x00005599eb02bd55 in SysTestRunnable::join[abi:cxx11]() (this=0x7ffca63a5df0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:131 #6 0x00005599eb02c789 in SysTestRunnable::run_until_finished[abi:cxx11](std::vector<SysTestRunnable*, std::allocator<SysTestRunnable*> >&) (runnables=std::vector of length 4, capacity 4 = {...}) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:154 #7 0x00005599eb027beb in main (argc=1, argv=0x7ffca63a61e8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/rados_watch_notify.cc:82 ... [root@smithi104 tmp]# gdb ceph_test_rados_watch_notify 41470 ... (gdb) thr app all bt Thread 15 (Thread 0x7f1b20ff9700 (LWP 44923)): #0 0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed157948, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fd52b2b in Finisher::finisher_thread_entry (this=0x5599ed1578e0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Finisher.cc:86 #3 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #4 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 14 (Thread 0x7f1b217fa700 (LWP 44922)): #0 0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed157770, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fda0fd1 in SafeTimer::timer_thread (this=0x5599ed157758) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:106 #3 0x00007f1b3fda2871 in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:30 #4 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #5 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 13 (Thread 0x7f1b21ffb700 (LWP 44921)): #0 0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed157448, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fda0fd1 in SafeTimer::timer_thread (this=0x5599ed157430) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:106 #3 0x00007f1b3fda2871 in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:30 #4 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #5 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 12 (Thread 0x7f1b227fc700 (LWP 44912)): #0 0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed156ef8, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fd52b2b in Finisher::finisher_thread_entry (this=0x5599ed156e90) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Finisher.cc:86 #3 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #4 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7f1b22ffd700 (LWP 44911)): #0 0x00007f1b48f4c7ca in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe14124 in ceph::condition_variable_debug::_wait_until (this=0x5599ed156de0, mutex=0x5599ed156d68, ts=ts@entry=0x7f1b22ffaf30) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fda160e in ceph::condition_variable_debug::wait_until<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > (when=..., lock=..., this=0x5599ed156de0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_time.h:123 #3 SafeTimer::timer_thread (this=0x5599ed156dc8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:108 #4 0x00007f1b3fda2871 in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:30 #5 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #6 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x7f1b237fe700 (LWP 44910)): #0 0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed159360, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fee4031 in DispatchQueue::run_local_delivery (this=0x5599ed159148) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.cc:117 #3 0x00007f1b3ff89a91 in DispatchQueue::LocalDeliveryThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.h:115 #4 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #5 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x7f1b23fff700 (LWP 44909)): #0 0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed1591b8, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fee4af8 in DispatchQueue::entry (this=0x5599ed159148) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.cc:210 #3 0x00007f1b3ff899d1 in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.h:101 #4 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #5 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7f1b38d9d700 (LWP 44908)): #0 0x00007f1b48f4c7ca in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b4923e7f3 in __gthread_cond_timedwait (__abs_timeout=0x7f1b38d9af80, __mutex=<optimized out>, __cond=0x5599ed1a3a58) at /usr/include/c++/8/x86_64-redhat-linux/bits/gthr-default.h:871 #2 std::condition_variable::__wait_until_impl<std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > (__atime=<synthetic pointer>..., __lock=..., this=0x5599ed1a3a58) at /usr/include/c++/8/condition_variable:178 #3 std::condition_variable::wait_until<ceph::time_detail::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=..., this=0x5599ed1a3a58) at /usr/include/c++/8/condition_variable:119 #4 ceph::timer_detail::timer<ceph::time_detail::coarse_mono_clock>::timer_thread (this=0x5599ed1a39f0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_timer.h:146 #5 0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6 #6 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #7 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f1b3959e700 (LWP 44907)): #0 0x00007f1b3ded5211 in poll () from /lib64/libc.so.6 #1 0x00007f1b3fdce38f in poll (__timeout=-1, __nfds=2, __fds=0x7f1b3959bf60) at /usr/include/bits/poll2.h:46 #2 AdminSocket::entry (this=0x5599ed1174f0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/admin_socket.cc:241 #3 0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6 #4 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #5 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f1b39d9f700 (LWP 44906)): #0 0x00007f1b48f4c7ca in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe14124 in ceph::condition_variable_debug::_wait_until (this=this@entry=0x5599ed1a1810, mutex=0x5599ed1a17b0, ts=ts@entry=0x7f1b39d9cf80) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fdf1fed in ceph::condition_variable_debug::wait_for<unsigned long, std::ratio<1l, 1000000000l> > (awhile=<synthetic pointer>..., lock=..., this=0x5599ed1a1810) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_time.h:123 #3 ceph::common::CephContextServiceThread::entry (this=0x5599ed1a1790) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_context.cc:210 #4 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #5 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f1b3a5a0700 (LWP 44905)): #0 0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3e7fd870 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6 #2 0x00007f1b40016047 in ceph::logging::Log::entry (this=0x5599ed0fa5d0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/log/Log.cc:435 --Type <RET> for more, q to quit, c to continue without paging-- #3 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #4 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f1b3ada1700 (LWP 44896)): #0 0x00007f1b3dee0467 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f1b3ffe2774 in EpollDriver::event_wait (this=0x5599ed15d2d0, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/EventEpoll.cc:123 #2 0x00007f1b3ffd433b in EventCenter::process_events (this=this@entry=0x5599ed166680, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f1b3ad9ef28) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Event.cc:392 #3 0x00007f1b3ffdc77c in NetworkStack::<lambda()>::operator() (__closure=0x5599ed15d1e8, __closure=0x5599ed15d1e8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Stack.cc:53 #4 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297 #5 0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6 #6 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #7 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f1b3b5a2700 (LWP 44895)): #0 0x00007f1b3dee0467 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f1b3ffe2774 in EpollDriver::event_wait (this=0x5599ed1650c0, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/EventEpoll.cc:123 #2 0x00007f1b3ffd433b in EventCenter::process_events (this=this@entry=0x5599ed15be70, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f1b3b59ff28) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Event.cc:392 #3 0x00007f1b3ffdc77c in NetworkStack::<lambda()>::operator() (__closure=0x5599ed15d1b8, __closure=0x5599ed15d1b8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Stack.cc:53 #4 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297 #5 0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6 #6 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #7 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f1b3bda3700 (LWP 44894)): #0 0x00007f1b3dee0467 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f1b3ffe2774 in EpollDriver::event_wait (this=0x5599ed166a90, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/EventEpoll.cc:123 #2 0x00007f1b3ffd433b in EventCenter::process_events (this=this@entry=0x5599ed19d070, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f1b3bda0f28) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Event.cc:392 #3 0x00007f1b3ffdc77c in NetworkStack::<lambda()>::operator() (__closure=0x5599ed15ca58, __closure=0x5599ed15ca58) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Stack.cc:53 #4 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297 #5 0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6 #6 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #7 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f1b496d6640 (LWP 41470)): #0 0x00007f1b48f4ee56 in do_futex_wait.constprop () from /lib64/libpthread.so.0 #1 0x00007f1b48f4ef48 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0 #2 0x00005599eb02b5e2 in CrossProcessSem::wait (this=0x5599ed101750) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/cross_process_sem.cc:65 #3 0x00005599eb02afdd in StRadosWatch::run (this=0x7ffca63a5df0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/st_rados_watch.cc:84 #4 0x00005599eb02ba5e in systest_runnable_pthread_helper (arg=arg@entry=0x7ffca63a5df0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:171 #5 0x00005599eb02c414 in SysTestRunnable::start (this=0x7ffca63a5df0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:97 #6 0x00005599eb02c49a in SysTestRunnable::run_until_finished[abi:cxx11](std::vector<SysTestRunnable*, std::allocator<SysTestRunnable*> >&) (runnables=std::vector of length 4, capacity 4 = {...}) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:142 #7 0x00005599eb027beb in main (argc=1, argv=0x7ffca63a61e8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/rados_watch_notify.cc:82 ... [root@smithi104 tmp]# gdb ceph_test_rados_watch_notify 41472 ... (gdb) thr app all bt Thread 15 (Thread 0x7f1b20ff9700 (LWP 41878)): #0 0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed157948, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fd52b2b in Finisher::finisher_thread_entry (this=0x5599ed1578e0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Finisher.cc:86 #3 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #4 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 14 (Thread 0x7f1b217fa700 (LWP 41877)): #0 0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed157770, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fda0fd1 in SafeTimer::timer_thread (this=0x5599ed157758) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:106 #3 0x00007f1b3fda2871 in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:30 #4 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #5 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 13 (Thread 0x7f1b21ffb700 (LWP 41876)): #0 0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed157448, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fda0fd1 in SafeTimer::timer_thread (this=0x5599ed157430) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:106 #3 0x00007f1b3fda2871 in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:30 #4 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #5 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 12 (Thread 0x7f1b227fc700 (LWP 41666)): #0 0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed156ef8, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fd52b2b in Finisher::finisher_thread_entry (this=0x5599ed156e90) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Finisher.cc:86 #3 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #4 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7f1b22ffd700 (LWP 41665)): #0 0x00007f1b48f4c7ca in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe14124 in ceph::condition_variable_debug::_wait_until (this=0x5599ed156de0, mutex=0x5599ed156d68, ts=ts@entry=0x7f1b22ffaf30) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fda160e in ceph::condition_variable_debug::wait_until<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > (when=..., lock=..., this=0x5599ed156de0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_time.h:123 #3 SafeTimer::timer_thread (this=0x5599ed156dc8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:108 #4 0x00007f1b3fda2871 in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:30 #5 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #6 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x7f1b237fe700 (LWP 41664)): #0 0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed159360, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fee4031 in DispatchQueue::run_local_delivery (this=0x5599ed159148) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.cc:117 #3 0x00007f1b3ff89a91 in DispatchQueue::LocalDeliveryThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.h:115 #4 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #5 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x7f1b23fff700 (LWP 41663)): #0 0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed1591b8, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fee4af8 in DispatchQueue::entry (this=0x5599ed159148) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.cc:210 #3 0x00007f1b3ff899d1 in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.h:101 #4 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #5 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7f1b38d9d700 (LWP 41662)): #0 0x00007f1b48f4c7ca in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b4923e7f3 in __gthread_cond_timedwait (__abs_timeout=0x7f1b38d9af80, __mutex=<optimized out>, __cond=0x5599ed19bec8) at /usr/include/c++/8/x86_64-redhat-linux/bits/gthr-default.h:871 #2 std::condition_variable::__wait_until_impl<std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > (__atime=<synthetic pointer>..., __lock=..., this=0x5599ed19bec8) at /usr/include/c++/8/condition_variable:178 #3 std::condition_variable::wait_until<ceph::time_detail::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=..., this=0x5599ed19bec8) at /usr/include/c++/8/condition_variable:119 #4 ceph::timer_detail::timer<ceph::time_detail::coarse_mono_clock>::timer_thread (this=0x5599ed19be60) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_timer.h:146 #5 0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6 #6 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #7 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f1b3959e700 (LWP 41661)): #0 0x00007f1b3ded5211 in poll () from /lib64/libc.so.6 #1 0x00007f1b3fdce38f in poll (__timeout=-1, __nfds=2, __fds=0x7f1b3959bf60) at /usr/include/bits/poll2.h:46 #2 AdminSocket::entry (this=0x5599ed1174f0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/admin_socket.cc:241 #3 0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6 #4 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #5 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f1b39d9f700 (LWP 41660)): #0 0x00007f1b48f4c7ca in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3fe14124 in ceph::condition_variable_debug::_wait_until (this=this@entry=0x5599ed1a1810, mutex=0x5599ed1a17b0, ts=ts@entry=0x7f1b39d9cf80) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147 #2 0x00007f1b3fdf1fed in ceph::condition_variable_debug::wait_for<unsigned long, std::ratio<1l, 1000000000l> > (awhile=<synthetic pointer>..., lock=..., this=0x5599ed1a1810) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_time.h:123 #3 ceph::common::CephContextServiceThread::entry (this=0x5599ed1a1790) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_context.cc:210 #4 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #5 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f1b3a5a0700 (LWP 41659)): #0 0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f1b3e7fd870 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6 #2 0x00007f1b40016047 in ceph::logging::Log::entry (this=0x5599ed0fa5d0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/log/Log.cc:435 --Type <RET> for more, q to quit, c to continue without paging-- #3 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #4 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f1b3ada1700 (LWP 41491)): #0 0x00007f1b3dee0467 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f1b3ffe2774 in EpollDriver::event_wait (this=0x5599ed15d2d0, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/EventEpoll.cc:123 #2 0x00007f1b3ffd433b in EventCenter::process_events (this=this@entry=0x5599ed166680, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f1b3ad9ef28) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Event.cc:392 #3 0x00007f1b3ffdc77c in NetworkStack::<lambda()>::operator() (__closure=0x5599ed15d1e8, __closure=0x5599ed15d1e8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Stack.cc:53 #4 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297 #5 0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6 #6 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #7 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f1b3b5a2700 (LWP 41490)): #0 0x00007f1b3dee0467 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f1b3ffe2774 in EpollDriver::event_wait (this=0x5599ed1650c0, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/EventEpoll.cc:123 #2 0x00007f1b3ffd433b in EventCenter::process_events (this=this@entry=0x5599ed15be70, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f1b3b59ff28) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Event.cc:392 #3 0x00007f1b3ffdc77c in NetworkStack::<lambda()>::operator() (__closure=0x5599ed15d1b8, __closure=0x5599ed15d1b8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Stack.cc:53 #4 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297 #5 0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6 #6 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #7 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f1b3bda3700 (LWP 41489)): #0 0x00007f1b3dee0467 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f1b3ffe2774 in EpollDriver::event_wait (this=0x5599ed166a90, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/EventEpoll.cc:123 #2 0x00007f1b3ffd433b in EventCenter::process_events (this=this@entry=0x5599ed19d070, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f1b3bda0f28) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Event.cc:392 #3 0x00007f1b3ffdc77c in NetworkStack::<lambda()>::operator() (__closure=0x5599ed15ca58, __closure=0x5599ed15ca58) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Stack.cc:53 #4 std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297 #5 0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6 #6 0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0 #7 0x00007f1b3dee0133 in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f1b496d6640 (LWP 41472)): #0 0x00007f1b48f4ee56 in do_futex_wait.constprop () from /lib64/libpthread.so.0 #1 0x00007f1b48f4ef48 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0 #2 0x00005599eb02b5e2 in CrossProcessSem::wait (this=0x5599ed101750) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/cross_process_sem.cc:65 #3 0x00005599eb02a7c4 in StRadosDeletePool::run (this=0x7ffca63a5b10) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/st_rados_delete_pool.cc:46 #4 0x00005599eb02ba5e in systest_runnable_pthread_helper (arg=arg@entry=0x7ffca63a5b10) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:171 #5 0x00005599eb02c414 in SysTestRunnable::start (this=0x7ffca63a5b10) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:97 #6 0x00005599eb02c49a in SysTestRunnable::run_until_finished[abi:cxx11](std::vector<SysTestRunnable*, std::allocator<SysTestRunnable*> >&) (runnables=std::vector of length 4, capacity 4 = {...}) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:142 #7 0x00005599eb027beb in main (argc=1, argv=0x7ffca63a61e8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/rados_watch_notify.cc:82
#8 Updated by Sage Weil over 3 years ago
another one line comment 5 above
2020-03-18T19:53:11.915 INFO:tasks.workunit.client.0.smithi131.stdout: watch_notify: process_3_[24529]: shutting down. 2020-03-18T19:53:11.915 INFO:tasks.workunit.client.0.smithi131.stdout: watch_notify: process_2_[24528]: starting. 2020-03-18T19:53:11.915 INFO:tasks.workunit.client.0.smithi131.stdout: watch_notify: process_2_[24528]: watching object 0.obj 2020-03-18T19:53:11.916 INFO:tasks.workunit.client.0.smithi131.stdout: watch_notify: Received fewer notifies than expected: 0 < 1 2020-03-18T19:53:11.916 INFO:tasks.workunit.client.0.smithi131.stdout: watch_notify: process_2_[24528]: shutting down. 2020-03-18T19:53:11.916 INFO:tasks.workunit.client.0.smithi131.stdout: watch_notify: test1: got error: run_until_finished: runnable process_2: got error: [24506] returned exit_status (1) Operation not permitted
/a/sage-2020-03-18_16:51:46-rados:verify-wip-sage-testing-2020-03-18-0826-distro-basic-smithi/4866547
#9 Updated by Sage Weil over 3 years ago
i think we should consider jsut dropping this test. it's old test code written by colin almost 10 years ago and i'm not sure it's worth the effort to debug it...
#10 Updated by Sage Weil over 3 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 34044
#11 Updated by Sage Weil over 3 years ago
- Status changed from Fix Under Review to Resolved
#12 Updated by Josh Durgin over 3 years ago
- Status changed from Resolved to Pending Backport
- Backport set to octopus, nautilus
Let's remove these tests from the stable branches too.
#13 Updated by Nathan Cutler over 3 years ago
- Copied to Backport #46017: nautilus: ceph_test_rados_watch_notify hang added
#14 Updated by Nathan Cutler over 3 years ago
- Copied to Backport #46018: octopus: ceph_test_rados_watch_notify hang added
#15 Updated by Nathan Cutler about 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".