Project

General

Profile

Bug #43861

ceph_test_rados_watch_notify hang

Added by Sage Weil over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
octopus, nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

[root@smithi130 tmp]# cat ceph_test_rados_watch_notify.log
process_1_[25350]: starting.
process_1_[25350]: creating pool foo.25320
process_1_[25350]: created object 0...
process_1_[25350]: finishing.
process_1_[25350]: shutting down.
process_3_[25352]: starting.
process_3_[25352]: notifying object 0.obj
process_3_[25352]: shutting down.
process_4_[25353]: starting.
process_4_[25353]: deleting pool foo.25320
process_4_[25353]: shutting down.
process_2_[25351]: starting.
process_2_[25351]: watching object 0.obj
process_2_[25351]: shutting down.
*******************************
process_5_[26143]: starting.
process_5_[26143]: creating pool foo.25320.
process_5_[26143]: finishing.
process_5_[26143]: shutting down.
*******************************
process_7_[26145]: starting.
process_7_[26145]: notifying object 0.obj
process_7_[26145]: shutting down.
*******************************
process_6_[26144]: starting.
process_6_[26144]: watching object 0.obj
process_6_[26144]: shutting down.
*******************************
process_8_[26146]: starting.
process_8_[26146]: deleting pool foo.25320.
process_8_[26146]: shutting down.
*******************************
*******************************
process_9_[26236]: starting.
process_9_[26236]: creating pool foo.25320..
process_9_[26236]: created object 0...
process_9_[26236]: finishing.
process_9_[26236]: shutting down.
*******************************
*******************************
process_11_[26238]: starting.
process_11_[26238]: notifying object 0.obj
process_11_[26238]: file /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/huge/release/15.0.0-9869-g4b944a6/rpm/el8/BUILD/ceph-15.0.0-9869-g4b944a6/src/test/system/st_rados_notify.cc, line 69: expected 0, got -110
process_11_[26238]: shutting down.

that process is the zombie
  25320 ?        S      0:00  |                   \_ ceph_test_rados_watch_notify
  26237 ?        Sl     0:02  |                   |   \_ ceph_test_rados_watch_notify
  26238 ?        Z      0:00  |                   |   \_ [ceph_test_rados] <defunct>
  26239 ?        Sl     0:00  |                   |   \_ ceph_test_rados_watch_notify
  26240 ?        S      0:00  |                   |   \_ ceph_test_rados_watch_notify
  26241 ?        S      0:00  |                   |   \_ ceph_test_rados_watch_notify
  26242 ?        Sl     0:00  |                   |   \_ ceph_test_rados_watch_notify

Related issues

Related to RADOS - Bug #44062: LibRadosWatchNotify.WatchNotify failure Resolved
Copied to RADOS - Backport #46017: nautilus: ceph_test_rados_watch_notify hang Resolved
Copied to RADOS - Backport #46018: octopus: ceph_test_rados_watch_notify hang Resolved

History

#1 Updated by Sage Weil over 3 years ago

/a/sage-2020-01-28_03:52:05-rados-wip-sage2-testing-2020-01-27-1839-distro-basic-smithi/4713217

#2 Updated by Sage Weil over 3 years ago

same?
/a/sage-2020-02-01_03:27:35-rados-wip-sage-testing-2020-01-31-1746-distro-basic-smithi/4723146

ceph_test_watch_notify didn't generate any output.

#3 Updated by Brad Hubbard over 3 years ago

  • Related to Bug #44062: LibRadosWatchNotify.WatchNotify failure added

#4 Updated by Brad Hubbard over 3 years ago

Almost certainly the same issue as #44062

#5 Updated by Sage Weil over 3 years ago

2020-02-23T19:20:15.553 INFO:tasks.workunit.client.0.smithi073.stdout:             watch_notify: process_1_[36775]: creating pool foo.36733
2020-02-23T19:20:15.553 INFO:tasks.workunit.client.0.smithi073.stdout:             watch_notify: process_1_[36775]: created object 0...
2020-02-23T19:20:15.553 INFO:tasks.workunit.client.0.smithi073.stdout:             watch_notify: process_1_[36775]: finishing.
2020-02-23T19:20:15.553 INFO:tasks.workunit.client.0.smithi073.stdout:             watch_notify: process_1_[36775]: shutting down.
2020-02-23T19:20:15.554 INFO:tasks.workunit.client.0.smithi073.stdout:             watch_notify: process_3_[36777]: starting.
2020-02-23T19:20:15.554 INFO:tasks.workunit.client.0.smithi073.stdout:             watch_notify: process_3_[36777]: notifying object 0.obj
2020-02-23T19:20:15.562 INFO:tasks.workunit.client.0.smithi073.stdout:             watch_notify: process_3_[36777]: shutting down.
2020-02-23T19:20:15.562 INFO:tasks.workunit.client.0.smithi073.stdout:             watch_notify: process_2_[36776]: starting.
2020-02-23T19:20:15.563 INFO:tasks.workunit.client.0.smithi073.stdout:             watch_notify: process_2_[36776]: watching object 0.obj
2020-02-23T19:20:15.563 INFO:tasks.workunit.client.0.smithi073.stdout:             watch_notify: Received fewer notifies than expected: 0 < 1
2020-02-23T19:20:15.563 INFO:tasks.workunit.client.0.smithi073.stdout:             watch_notify: process_2_[36776]: shutting down.
2020-02-23T19:20:15.563 INFO:tasks.workunit.client.0.smithi073.stdout:             watch_notify: test1: got error: run_until_finished: runnable process_2: got error: [36733] returned exit_status (1) Operation not permitted

/a/sage-2020-02-23_16:41:52-rados-wip-sage2-testing-2020-02-23-0755-distro-basic-smithi/4793920

#6 Updated by Sage Weil over 3 years ago

no output at all, like comment 2 above:
/a/sage-2020-03-06_17:29:42-rados-wip-sage4-testing-2020-03-05-1645-distro-basic-smithi/4831763

#7 Updated by Sage Weil over 3 years ago

  41404 ?        S      0:00  |       |       \_ bash -o pipefail -exc ceph_test_rados_watch_notify  2>&1 | tee ceph_test_rados_watch_notify.log | sed "s/^/             watch_notify: /" 
  41437 ?        S      0:00  |       |           \_ ceph_test_rados_watch_notify
  41470 ?        Sl     0:01  |       |           |   \_ ceph_test_rados_watch_notify
  41471 ?        Z      0:00  |       |           |   \_ [ceph_test_rados] <defunct>
  41472 ?        Sl     0:00  |       |           |   \_ ceph_test_rados_watch_notify
  41438 ?        S      0:00  |       |           \_ tee ceph_test_rados_watch_notify.log
  41439 ?        S      0:00  |       |           \_ sed s/^/             watch_notify: /
...
[root@smithi104 tmp]# gdb ceph_test_rados_watch_notify 41437
...
Thread 1 (Thread 0x7f1b496d6640 (LWP 41437)):
#0  0x00007f1b48f4fb52 in read () from /lib64/libpthread.so.0
#1  0x00007f1b401f90ef in read (__nbytes=4, __buf=0x7ffca63a5068, __fd=4) at /usr/include/bits/unistd.h:44
#2  safe_read (fd=4, buf=buf@entry=0x7ffca63a5068, count=count@entry=4) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/safe_io.c:30
#3  0x00007f1b401f914d in safe_read_exact (fd=<optimized out>, buf=buf@entry=0x7ffca63a5068, count=count@entry=4) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/safe_io.c:48
#4  0x00005599eb02ceda in Preforker::parent_wait (this=this@entry=0x7ffca63a5e08, err_msg="") at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Preforker.h:82
#5  0x00005599eb02bd55 in SysTestRunnable::join[abi:cxx11]() (this=0x7ffca63a5df0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:131
#6  0x00005599eb02c789 in SysTestRunnable::run_until_finished[abi:cxx11](std::vector<SysTestRunnable*, std::allocator<SysTestRunnable*> >&) (runnables=std::vector of length 4, capacity 4 = {...}) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:154
#7  0x00005599eb027beb in main (argc=1, argv=0x7ffca63a61e8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/rados_watch_notify.cc:82
...
[root@smithi104 tmp]# gdb ceph_test_rados_watch_notify 41470
...
(gdb) thr app all bt

Thread 15 (Thread 0x7f1b20ff9700 (LWP 44923)):
#0  0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed157948, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fd52b2b in Finisher::finisher_thread_entry (this=0x5599ed1578e0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Finisher.cc:86
#3  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#4  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f1b217fa700 (LWP 44922)):
#0  0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed157770, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fda0fd1 in SafeTimer::timer_thread (this=0x5599ed157758) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:106
#3  0x00007f1b3fda2871 in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:30
#4  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#5  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f1b21ffb700 (LWP 44921)):
#0  0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed157448, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fda0fd1 in SafeTimer::timer_thread (this=0x5599ed157430) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:106
#3  0x00007f1b3fda2871 in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:30
#4  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#5  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f1b227fc700 (LWP 44912)):
#0  0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed156ef8, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fd52b2b in Finisher::finisher_thread_entry (this=0x5599ed156e90) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Finisher.cc:86
#3  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#4  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f1b22ffd700 (LWP 44911)):
#0  0x00007f1b48f4c7ca in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe14124 in ceph::condition_variable_debug::_wait_until (this=0x5599ed156de0, mutex=0x5599ed156d68, ts=ts@entry=0x7f1b22ffaf30) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fda160e in ceph::condition_variable_debug::wait_until<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > (when=..., lock=..., this=0x5599ed156de0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_time.h:123
#3  SafeTimer::timer_thread (this=0x5599ed156dc8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:108
#4  0x00007f1b3fda2871 in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:30
#5  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#6  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f1b237fe700 (LWP 44910)):
#0  0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed159360, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fee4031 in DispatchQueue::run_local_delivery (this=0x5599ed159148) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.cc:117
#3  0x00007f1b3ff89a91 in DispatchQueue::LocalDeliveryThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.h:115
#4  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#5  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f1b23fff700 (LWP 44909)):
#0  0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed1591b8, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fee4af8 in DispatchQueue::entry (this=0x5599ed159148) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.cc:210
#3  0x00007f1b3ff899d1 in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.h:101
#4  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#5  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f1b38d9d700 (LWP 44908)):
#0  0x00007f1b48f4c7ca in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b4923e7f3 in __gthread_cond_timedwait (__abs_timeout=0x7f1b38d9af80, __mutex=<optimized out>, __cond=0x5599ed1a3a58) at /usr/include/c++/8/x86_64-redhat-linux/bits/gthr-default.h:871
#2  std::condition_variable::__wait_until_impl<std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > (__atime=<synthetic pointer>..., __lock=..., this=0x5599ed1a3a58) at /usr/include/c++/8/condition_variable:178
#3  std::condition_variable::wait_until<ceph::time_detail::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=..., this=0x5599ed1a3a58) at /usr/include/c++/8/condition_variable:119
#4  ceph::timer_detail::timer<ceph::time_detail::coarse_mono_clock>::timer_thread (this=0x5599ed1a39f0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_timer.h:146
#5  0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#6  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#7  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f1b3959e700 (LWP 44907)):
#0  0x00007f1b3ded5211 in poll () from /lib64/libc.so.6
#1  0x00007f1b3fdce38f in poll (__timeout=-1, __nfds=2, __fds=0x7f1b3959bf60) at /usr/include/bits/poll2.h:46
#2  AdminSocket::entry (this=0x5599ed1174f0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/admin_socket.cc:241
#3  0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#4  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#5  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f1b39d9f700 (LWP 44906)):
#0  0x00007f1b48f4c7ca in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe14124 in ceph::condition_variable_debug::_wait_until (this=this@entry=0x5599ed1a1810, mutex=0x5599ed1a17b0, ts=ts@entry=0x7f1b39d9cf80) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fdf1fed in ceph::condition_variable_debug::wait_for<unsigned long, std::ratio<1l, 1000000000l> > (awhile=<synthetic pointer>..., lock=..., this=0x5599ed1a1810) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_time.h:123
#3  ceph::common::CephContextServiceThread::entry (this=0x5599ed1a1790) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_context.cc:210
#4  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#5  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f1b3a5a0700 (LWP 44905)):
#0  0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3e7fd870 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2  0x00007f1b40016047 in ceph::logging::Log::entry (this=0x5599ed0fa5d0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/log/Log.cc:435
--Type <RET> for more, q to quit, c to continue without paging--
#3  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#4  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f1b3ada1700 (LWP 44896)):
#0  0x00007f1b3dee0467 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f1b3ffe2774 in EpollDriver::event_wait (this=0x5599ed15d2d0, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/EventEpoll.cc:123
#2  0x00007f1b3ffd433b in EventCenter::process_events (this=this@entry=0x5599ed166680, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f1b3ad9ef28) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Event.cc:392
#3  0x00007f1b3ffdc77c in NetworkStack::<lambda()>::operator() (__closure=0x5599ed15d1e8, __closure=0x5599ed15d1e8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Stack.cc:53
#4  std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297
#5  0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#6  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#7  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f1b3b5a2700 (LWP 44895)):
#0  0x00007f1b3dee0467 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f1b3ffe2774 in EpollDriver::event_wait (this=0x5599ed1650c0, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/EventEpoll.cc:123
#2  0x00007f1b3ffd433b in EventCenter::process_events (this=this@entry=0x5599ed15be70, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f1b3b59ff28) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Event.cc:392
#3  0x00007f1b3ffdc77c in NetworkStack::<lambda()>::operator() (__closure=0x5599ed15d1b8, __closure=0x5599ed15d1b8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Stack.cc:53
#4  std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297
#5  0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#6  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#7  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f1b3bda3700 (LWP 44894)):
#0  0x00007f1b3dee0467 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f1b3ffe2774 in EpollDriver::event_wait (this=0x5599ed166a90, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/EventEpoll.cc:123
#2  0x00007f1b3ffd433b in EventCenter::process_events (this=this@entry=0x5599ed19d070, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f1b3bda0f28) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Event.cc:392
#3  0x00007f1b3ffdc77c in NetworkStack::<lambda()>::operator() (__closure=0x5599ed15ca58, __closure=0x5599ed15ca58) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Stack.cc:53
#4  std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297
#5  0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#6  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#7  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f1b496d6640 (LWP 41470)):
#0  0x00007f1b48f4ee56 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f1b48f4ef48 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00005599eb02b5e2 in CrossProcessSem::wait (this=0x5599ed101750) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/cross_process_sem.cc:65
#3  0x00005599eb02afdd in StRadosWatch::run (this=0x7ffca63a5df0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/st_rados_watch.cc:84
#4  0x00005599eb02ba5e in systest_runnable_pthread_helper (arg=arg@entry=0x7ffca63a5df0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:171
#5  0x00005599eb02c414 in SysTestRunnable::start (this=0x7ffca63a5df0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:97
#6  0x00005599eb02c49a in SysTestRunnable::run_until_finished[abi:cxx11](std::vector<SysTestRunnable*, std::allocator<SysTestRunnable*> >&) (runnables=std::vector of length 4, capacity 4 = {...}) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:142
#7  0x00005599eb027beb in main (argc=1, argv=0x7ffca63a61e8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/rados_watch_notify.cc:82
...

[root@smithi104 tmp]# gdb ceph_test_rados_watch_notify 41472
...
(gdb) thr app all bt

Thread 15 (Thread 0x7f1b20ff9700 (LWP 41878)):
#0  0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed157948, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fd52b2b in Finisher::finisher_thread_entry (this=0x5599ed1578e0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Finisher.cc:86
#3  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#4  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f1b217fa700 (LWP 41877)):
#0  0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed157770, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fda0fd1 in SafeTimer::timer_thread (this=0x5599ed157758) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:106
#3  0x00007f1b3fda2871 in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:30
#4  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#5  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f1b21ffb700 (LWP 41876)):
#0  0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed157448, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fda0fd1 in SafeTimer::timer_thread (this=0x5599ed157430) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:106
#3  0x00007f1b3fda2871 in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:30
#4  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#5  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f1b227fc700 (LWP 41666)):
#0  0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed156ef8, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fd52b2b in Finisher::finisher_thread_entry (this=0x5599ed156e90) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Finisher.cc:86
#3  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#4  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f1b22ffd700 (LWP 41665)):
#0  0x00007f1b48f4c7ca in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe14124 in ceph::condition_variable_debug::_wait_until (this=0x5599ed156de0, mutex=0x5599ed156d68, ts=ts@entry=0x7f1b22ffaf30) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fda160e in ceph::condition_variable_debug::wait_until<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > (when=..., lock=..., this=0x5599ed156de0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_time.h:123
#3  SafeTimer::timer_thread (this=0x5599ed156dc8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:108
#4  0x00007f1b3fda2871 in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/Timer.cc:30
#5  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#6  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f1b237fe700 (LWP 41664)):
#0  0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed159360, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fee4031 in DispatchQueue::run_local_delivery (this=0x5599ed159148) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.cc:117
#3  0x00007f1b3ff89a91 in DispatchQueue::LocalDeliveryThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.h:115
#4  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#5  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f1b23fff700 (LWP 41663)):
#0  0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe13f04 in ceph::condition_variable_debug::wait (this=this@entry=0x5599ed1591b8, lock=...) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fee4af8 in DispatchQueue::entry (this=0x5599ed159148) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.cc:210
#3  0x00007f1b3ff899d1 in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/DispatchQueue.h:101
#4  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#5  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f1b38d9d700 (LWP 41662)):
#0  0x00007f1b48f4c7ca in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b4923e7f3 in __gthread_cond_timedwait (__abs_timeout=0x7f1b38d9af80, __mutex=<optimized out>, __cond=0x5599ed19bec8) at /usr/include/c++/8/x86_64-redhat-linux/bits/gthr-default.h:871
#2  std::condition_variable::__wait_until_impl<std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > (__atime=<synthetic pointer>..., __lock=..., this=0x5599ed19bec8) at /usr/include/c++/8/condition_variable:178
#3  std::condition_variable::wait_until<ceph::time_detail::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=..., this=0x5599ed19bec8) at /usr/include/c++/8/condition_variable:119
#4  ceph::timer_detail::timer<ceph::time_detail::coarse_mono_clock>::timer_thread (this=0x5599ed19be60) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_timer.h:146
#5  0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#6  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#7  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f1b3959e700 (LWP 41661)):
#0  0x00007f1b3ded5211 in poll () from /lib64/libc.so.6
#1  0x00007f1b3fdce38f in poll (__timeout=-1, __nfds=2, __fds=0x7f1b3959bf60) at /usr/include/bits/poll2.h:46
#2  AdminSocket::entry (this=0x5599ed1174f0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/admin_socket.cc:241
#3  0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#4  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#5  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f1b39d9f700 (LWP 41660)):
#0  0x00007f1b48f4c7ca in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3fe14124 in ceph::condition_variable_debug::_wait_until (this=this@entry=0x5599ed1a1810, mutex=0x5599ed1a17b0, ts=ts@entry=0x7f1b39d9cf80) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/mutex_debug.h:147
#2  0x00007f1b3fdf1fed in ceph::condition_variable_debug::wait_for<unsigned long, std::ratio<1l, 1000000000l> > (awhile=<synthetic pointer>..., lock=..., this=0x5599ed1a1810) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_time.h:123
#3  ceph::common::CephContextServiceThread::entry (this=0x5599ed1a1790) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/common/ceph_context.cc:210
#4  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#5  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f1b3a5a0700 (LWP 41659)):
#0  0x00007f1b48f4c47c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f1b3e7fd870 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2  0x00007f1b40016047 in ceph::logging::Log::entry (this=0x5599ed0fa5d0) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/log/Log.cc:435
--Type <RET> for more, q to quit, c to continue without paging--
#3  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#4  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f1b3ada1700 (LWP 41491)):
#0  0x00007f1b3dee0467 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f1b3ffe2774 in EpollDriver::event_wait (this=0x5599ed15d2d0, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/EventEpoll.cc:123
#2  0x00007f1b3ffd433b in EventCenter::process_events (this=this@entry=0x5599ed166680, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f1b3ad9ef28) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Event.cc:392
#3  0x00007f1b3ffdc77c in NetworkStack::<lambda()>::operator() (__closure=0x5599ed15d1e8, __closure=0x5599ed15d1e8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Stack.cc:53
#4  std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297
#5  0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#6  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#7  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f1b3b5a2700 (LWP 41490)):
#0  0x00007f1b3dee0467 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f1b3ffe2774 in EpollDriver::event_wait (this=0x5599ed1650c0, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/EventEpoll.cc:123
#2  0x00007f1b3ffd433b in EventCenter::process_events (this=this@entry=0x5599ed15be70, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f1b3b59ff28) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Event.cc:392
#3  0x00007f1b3ffdc77c in NetworkStack::<lambda()>::operator() (__closure=0x5599ed15d1b8, __closure=0x5599ed15d1b8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Stack.cc:53
#4  std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297
#5  0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#6  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#7  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f1b3bda3700 (LWP 41489)):
#0  0x00007f1b3dee0467 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f1b3ffe2774 in EpollDriver::event_wait (this=0x5599ed166a90, fired_events=std::vector of length 0, capacity 0, tvp=<optimized out>) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/EventEpoll.cc:123
#2  0x00007f1b3ffd433b in EventCenter::process_events (this=this@entry=0x5599ed19d070, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7f1b3bda0f28) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Event.cc:392
#3  0x00007f1b3ffdc77c in NetworkStack::<lambda()>::operator() (__closure=0x5599ed15ca58, __closure=0x5599ed15ca58) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/msg/async/Stack.cc:53
#4  std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297
#5  0x00007f1b3e803b23 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#6  0x00007f1b48f462de in start_thread () from /lib64/libpthread.so.0
#7  0x00007f1b3dee0133 in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f1b496d6640 (LWP 41472)):
#0  0x00007f1b48f4ee56 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007f1b48f4ef48 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00005599eb02b5e2 in CrossProcessSem::wait (this=0x5599ed101750) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/cross_process_sem.cc:65
#3  0x00005599eb02a7c4 in StRadosDeletePool::run (this=0x7ffca63a5b10) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/st_rados_delete_pool.cc:46
#4  0x00005599eb02ba5e in systest_runnable_pthread_helper (arg=arg@entry=0x7ffca63a5b10) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:171
#5  0x00005599eb02c414 in SysTestRunnable::start (this=0x7ffca63a5b10) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:97
#6  0x00005599eb02c49a in SysTestRunnable::run_until_finished[abi:cxx11](std::vector<SysTestRunnable*, std::allocator<SysTestRunnable*> >&) (runnables=std::vector of length 4, capacity 4 = {...}) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/systest_runnable.cc:142
#7  0x00005599eb027beb in main (argc=1, argv=0x7ffca63a61e8) at /usr/src/debug/ceph-15.1.0-1950.g42e5071.el8.x86_64/src/test/system/rados_watch_notify.cc:82

#8 Updated by Sage Weil over 3 years ago

another one line comment 5 above

2020-03-18T19:53:11.915 INFO:tasks.workunit.client.0.smithi131.stdout:             watch_notify: process_3_[24529]: shutting down.
2020-03-18T19:53:11.915 INFO:tasks.workunit.client.0.smithi131.stdout:             watch_notify: process_2_[24528]: starting.
2020-03-18T19:53:11.915 INFO:tasks.workunit.client.0.smithi131.stdout:             watch_notify: process_2_[24528]: watching object 0.obj
2020-03-18T19:53:11.916 INFO:tasks.workunit.client.0.smithi131.stdout:             watch_notify: Received fewer notifies than expected: 0 < 1
2020-03-18T19:53:11.916 INFO:tasks.workunit.client.0.smithi131.stdout:             watch_notify: process_2_[24528]: shutting down.
2020-03-18T19:53:11.916 INFO:tasks.workunit.client.0.smithi131.stdout:             watch_notify: test1: got error: run_until_finished: runnable process_2: got error: [24506] returned exit_status (1) Operation not permitted

/a/sage-2020-03-18_16:51:46-rados:verify-wip-sage-testing-2020-03-18-0826-distro-basic-smithi/4866547

#9 Updated by Sage Weil over 3 years ago

i think we should consider jsut dropping this test. it's old test code written by colin almost 10 years ago and i'm not sure it's worth the effort to debug it...

#10 Updated by Sage Weil over 3 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 34044

#11 Updated by Sage Weil over 3 years ago

  • Status changed from Fix Under Review to Resolved

#12 Updated by Josh Durgin over 3 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to octopus, nautilus

Let's remove these tests from the stable branches too.

#13 Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #46017: nautilus: ceph_test_rados_watch_notify hang added

#14 Updated by Nathan Cutler over 3 years ago

  • Copied to Backport #46018: octopus: ceph_test_rados_watch_notify hang added

#15 Updated by Nathan Cutler about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF