Project

General

Profile

Actions

Bug #53500

open

rte_eal_init fail will waiting forever

Added by chunsong feng over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Messenger
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The rte_eal_init returns a failure message and does not wake up the waiting msgr-worker thread. As a result, the waiting thread is always waiting and cannot exit abnormally.
(gdb) info threads
Id Target Id Frame
  • 1 Thread 0xfffcba366ec0 (LWP 2408807) "ceph-osd" 0x0000fffcbaecda8c in pthread_cond_wait@GLIBC_2.17 () from /lib64/libpthread.so.0
    2 Thread 0xfffcba0bb680 (LWP 2408832) "log" 0x0000fffcbaecda8c in pthread_cond_wait
    @GLIBC_2.17 () from /lib64/libpthread.so.0
    3 Thread 0xfffcb984b680 (LWP 2408846) "io_context_pool" 0x0000fffcbaecda8c in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
    4 Thread 0xfffcb642b680 (LWP 2408897) "eal-intr-thread" 0x0000fffcba671900 in epoll_pwait () from /lib64/libc.so.6
    5 Thread 0xfffcb5c1b680 (LWP 2409130) "lcore-worker-25" 0x0000fffcbaed1538 in read () from /lib64/libpthread.so.0
    6 Thread 0xfffcb540b680 (LWP 2409131) "lcore-worker-26" 0x0000fffcbaed1538 in read () from /lib64/libpthread.so.0
    7 Thread 0xfffcb4bfb680 (LWP 2409132) "lcore-worker-27" 0x0000fffcbaed1538 in read () from /lib64/libpthread.so.0
    (gdb) thread apply all bt

Thread 7 (Thread 0xfffcb4bfb680 (LWP 2409132)):
#0 0x0000fffcbaed1538 in read () from /lib64/libpthread.so.0
#1 0x0000fffcbad151d4 in eal_thread_loop (arg=0x0) at ../lib/librte_eal/linux/eal_thread.c:107
#2 0x0000fffcbaec7800 in start_thread () from /lib64/libpthread.so.0
#3 0x0000fffcba6717dc in thread_start () from /lib64/libc.so.6

Thread 6 (Thread 0xfffcb540b680 (LWP 2409131)):
#0 0x0000fffcbaed1538 in read () from /lib64/libpthread.so.0
#1 0x0000fffcbad151d4 in eal_thread_loop (arg=0x0) at ../lib/librte_eal/linux/eal_thread.c:107
#2 0x0000fffcbaec7800 in start_thread () from /lib64/libpthread.so.0
#3 0x0000fffcba6717dc in thread_start () from /lib64/libc.so.6

Thread 5 (Thread 0xfffcb5c1b680 (LWP 2409130)):
#0 0x0000fffcbaed1538 in read () from /lib64/libpthread.so.0
#1 0x0000fffcbad151d4 in eal_thread_loop (arg=0x0) at ../lib/librte_eal/linux/eal_thread.c:107
#2 0x0000fffcbaec7800 in start_thread () from /lib64/libpthread.so.0
#3 0x0000fffcba6717dc in thread_start () from /lib64/libc.so.6

Thread 4 (Thread 0xfffcb642b680 (LWP 2408897)):
#0 0x0000fffcba671900 in epoll_pwait () from /lib64/libc.so.6
#1 0x0000fffcbad0bc5c in eal_intr_handle_interrupts (pfd=9, totalfds=4) at ../lib/librte_eal/linux/eal_interrupts.c:1045
#2 0x0000fffcbad0beb4 in eal_intr_thread_main (arg=0x0) at ../lib/librte_eal/linux/eal_interrupts.c:1130
#3 0x0000fffcbacf1768 in ctrl_thread_init (arg=0xaaadb1ad6db0) at ../lib/librte_eal/common/eal_common_thread.c:193
#4 0x0000fffcbaec7800 in start_thread () from /lib64/libpthread.so.0
#5 0x0000fffcba6717dc in thread_start () from /lib64/libc.so.6

Thread 3 (Thread 0xfffcb984b680 (LWP 2408846)):
#0 0x0000fffcbaecda8c in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1 0x0000aaad68094240 in boost::asio::detail::posix_event::wait<boost::asio::detail::conditionally_enabled_mutex::scoped_lock> (lock=..., this=<optimized out>)
at /home/rpmbuild/BUILD/ceph-16.2.6/build/boost/include/boost/asio/detail/conditionally_enabled_mutex.hpp:98
#2 boost::asio::detail::conditionally_enabled_event::wait (lock=..., this=0xaaadb1bf0e70) at /home/rpmbuild/BUILD/ceph-16.2.6/build/boost/include/boost/asio/detail/conditionally_enabled_event.hpp:89
#3 boost::asio::detail::scheduler::do_run_one (ec=..., this_thread=..., lock=..., this=0xaaadb1bf0e00) at /home/rpmbuild/BUILD/ceph-16.2.6/build/boost/include/boost/asio/detail/impl/scheduler.ipp:455
#4 boost::asio::detail::scheduler::run (this=0xaaadb1bf0e00, ec=...) at /home/rpmbuild/BUILD/ceph-16.2.6/build/boost/include/boost/asio/detail/impl/scheduler.ipp:200
#5 0x0000aaad68097a4c in boost::asio::io_context::run (this=<optimized out>) at /home/rpmbuild/BUILD/ceph-16.2.6/build/boost/include/boost/asio/impl/io_context.ipp:63
#6 ceph::async::io_context_pool::start(short)::{lambda()#1}::operator()() const (_closure=0xaaadb1c44178) at /home/rpmbuild/BUILD/ceph-16.2.6/src/common/async/context_pool.h:68
#7 std::
_invoke_impl<void, ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::__invoke_other, ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (_f=...) at /usr/include/c++/8/bits/invoke.h:60
#8 std::
_invoke<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::__invoke_result&&, (ceph::async::io_context_pool::start(short)::{lambda()#1}&&)...) (_fn=...) at /usr/include/c++/8/bits/invoke.h:95
#9 std::invoke<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::invoke_result&&, (ceph::async::io_context_pool::start(short)::{lambda()#1}&&)...) (
_fn=...) at /usr/include/c++/8/functional:81
#10 make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1, auto:2&&)#1}::operator()<{lambda()#1}> (fun=..., this=0xaaadb1c44180) at /home/rpmbuild/BUILD/ceph-16.2.6/src/common/Thread.h:79
#11 std::__invoke_impl<void, make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1, auto:2&&)#1}, {lambda()#1}>(std::__invoke_other, make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1, auto:2&&)#1}&&, {lambda()#1}&&) (_f=...) at /usr/include/c++/8/bits/invoke.h:60
#12 std::
_invoke<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1, auto:2&&)#1}, {lambda()#1}>(make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1, auto:2&&)#1}, {lambda()#1}&&) (__fn=...) at /usr/include/c++/8/bits/invoke.h:95
#13 std::thread::_Invoker<std::tuple<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1, auto:2&&)#1}, {lambda()#1}> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (this=0xaaadb1c44178) at /usr/include/c++/8/thread:244
#14 std::thread::_Invoker<std::tuple<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1, auto:2&&)#1}, {lambda()#1}> >::operator()() (this=0xaaadb1c44178) at /usr/include/c++/8/thread:253
#15 std::thread::_State_impl<std::thread::_Invoker<std::tuple<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1, auto:2&&)#1}, {lambda()#1}> > >::_M_run() (this=0xaaadb1c44170) at /usr/include/c++/8/thread:196
#16 0x0000fffcba8eca3c in execute_native_thread_routine () from /lib64/libstdc++.so.6
#17 0x0000fffcbaec7800 in start_thread () from /lib64/libpthread.so.0
#18 0x0000fffcba6717dc in thread_start () from /lib64/libc.so.6

Thread 2 (Thread 0xfffcba0bb680 (LWP 2408832)):
#0 0x0000fffcbaecda8c in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1 0x0000fffcba8e6580 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2 0x0000aaad688a8940 in ceph::logging::Log::entry (this=0xaaadb1be0b40) at /home/rpmbuild/BUILD/ceph-16.2.6/src/log/Log.cc:439
#3 0x0000fffcbaec7800 in start_thread () from /lib64/libpthread.so.0
#4 0x0000fffcba6717dc in thread_start () from /lib64/libc.so.6

Thread 1 (Thread 0xfffcba366ec0 (LWP 2408807)):
#0 0x0000fffcbaecda8c in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1 0x0000fffcba8e6580 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2 0x0000aaad68daad54 in dpdk::eal::init (this=this@entry=0xaaadb1b01548) at /home/rpmbuild/BUILD/ceph-16.2.6/src/msg/async/dpdk/dpdk_rte.cc:176
#3 0x0000aaad68d79138 in DPDKStack::spawn_worker(unsigned int, std::function<void ()>&&) (this=0xaaadb1b01340, i=0, func=...) at /home/rpmbuild/BUILD/ceph-16.2.6/src/msg/async/dpdk/DPDKStack.cc:324
--Type <RET> for more, q to quit, c to continue without paging--
#4 0x0000aaad688881b4 in NetworkStack::start (this=0xaaadb1b01340) at /home/rpmbuild/BUILD/ceph-16.2.6/src/msg/async/Stack.cc:227
#5 0x0000aaad6887d5c0 in AsyncMessenger::AsyncMessenger (this=0xaaadb2910000, cct=<optimized out>, name=..., type=..., mname=..., _nonce=4818251162118563060) at /home/rpmbuild/BUILD/ceph-16.2.6/src/msg/async/AsyncMessenger.cc:296
#6 0x0000aaad688707d4 in Messenger::create (cct=cct@entry=0xaaadb1e30000, type="async+dpdk", name=..., lname="", nonce=4818251162118563060) at /usr/include/c++/8/bits/char_traits.h:287
#7 0x0000aaad68870bfc in Messenger::create_client_messenger (cct=cct@entry=0xaaadb1e30000, lname="") at /usr/include/c++/8/bits/char_traits.h:287
#8 0x0000aaad688ca9dc in MonClient::get_monmap_and_config (this=this@entry=0xfffffbd794e8) at /usr/include/c++/8/ext/new_allocator.h:79
#9 0x0000aaad686b7aa8 in global_init (defaults=<optimized out>, args=..., module_type=<optimized out>, code_env=CODE_ENVIRONMENT_DAEMON, flags=0, run_pre_init=<optimized out>)
at /home/rpmbuild/BUILD/ceph-16.2.6/src/global/global_init.cc:363
#10 0x0000aaad68038738 in main (argc=<optimized out>, argv=<optimized out>) at /home/rpmbuild/BUILD/ceph-16.2.6/src/ceph_osd.cc:137
(gdb) f 2
#2 0x0000aaad68daad54 in dpdk::eal::init (this=this@entry=0xaaadb1b01548) at /home/rpmbuild/BUILD/ceph-16.2.6/src/msg/async/dpdk/dpdk_rte.cc:176
176 cond.wait(l);
(gdb) l
171 }
172 }
173 });
174 std::unique_lock<std::mutex> l(lock);
175 while (!done)
176 cond.wait(l);
177 return 0;
178 }
179
180 size_t eal::mem_size(int num_cpus)

Actions #1

Updated by chunsong feng over 2 years ago

r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/rpmbuild/BUILD/ceph-17.0.0-9454-g23e8f062a2e/aarch64-redhat-linux-gnu/bin/ceph_perf_msgr_client 172.19.36.152:4567 1 1 10000 1 41943
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0xfffff5d9c8c0 (LWP 3676911)]
[New Thread 0xfffff558c8c0 (LWP 3676912)]
[New Thread 0xfffff4d7c8c0 (LWP 3676913)]
using ms-public-type async+dpdk
server ip:port 172.19.36.152:4567
numjobs 1
concurrency 1
ios 10000
thinktime(us) 1
message data bytes 41943
[New Thread 0xfffff456c8c0 (LWP 3676915)]
EAL: Detected 96 lcore(s)
EAL: Detected 4 NUMA nodes
EAL: libmlx4.so.1: cannot open shared object file: No such file or directory
EAL: FATAL: Cannot init plugins
EAL: Cannot init plugins
[Thread 0xfffff456c8c0 (LWP 3676915) exited]

Thread 1 "ceph_perf_msgr_" received signal SIGINT, Interrupt.
0x0000fffff7f4da8c in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
(gdb) info threads
Id Target Id Frame
  • 1 Thread 0xfffff6066fc0 (LWP 3676910) "ceph_perf_msgr_" 0x0000fffff7f4da8c in pthread_cond_wait@GLIBC_2.17 () from /lib64/libpthread.so.0
    2 Thread 0xfffff5d9c8c0 (LWP 3676911) "log" 0x0000fffff7f4da8c in pthread_cond_wait
    @GLIBC_2.17 () from /lib64/libpthread.so.0
    3 Thread 0xfffff558c8c0 (LWP 3676912) "service" 0x0000fffff7f4dda4 in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libpthread.so.0
    4 Thread 0xfffff4d7c8c0 (LWP 3676913) "admin_socket" 0x0000fffff62e7dd8 in poll () from /lib64/libc.so.6
    (gdb) thread apply all bt

Thread 4 (Thread 0xfffff4d7c8c0 (LWP 3676913)):
#0 0x0000fffff62e7dd8 in poll () from /lib64/libc.so.6
#1 0x0000fffff7356bb4 in AdminSocket::entry (this=0xaaaaab440580) at /home/rpmbuild/BUILD/ceph-17.0.0-9454-g23e8f062a2e/src/common/admin_socket.cc:254
#2 0x0000fffff656cd2c in execute_native_thread_routine () from /lib64/libstdc++.so.6
#3 0x0000fffff7f47800 in start_thread () from /lib64/libpthread.so.0
#4 0x0000fffff62f17dc in thread_start () from /lib64/libc.so.6

Thread 3 (Thread 0xfffff558c8c0 (LWP 3676912)):
#0 0x0000fffff7f4dda4 in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1 0x0000fffff736eddc in _gthread_cond_timedwait (_abs_timeout=0xfffff558c088, _mutex=<optimized out>, __cond=0xaaaaab471c98) at /usr/include/c++/8/aarch64-redhat-linux/bits/gthr-default.h:871
#2 std::condition_variable::
_wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (_atime=..., __lock=..., this=0xaaaaab471c98) at /usr/include/c++/8/condition_variable:178
#3 std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (
_atime=..., _lock=..., this=0xaaaaab471c98) at /usr/include/c++/8/condition_variable:106
#4 std::condition_variable::wait_for<unsigned long, std::ratio<1l, 1000000000l> > (
_rtime=<synthetic pointer>..., __lock=..., this=0xaaaaab471c98) at /usr/include/c++/8/condition_variable:143
#5 ceph::common::CephContextServiceThread::entry (this=0xaaaaab471c30) at /home/rpmbuild/BUILD/ceph-17.0.0-9454-g23e8f062a2e/src/common/ceph_context.cc:221
#6 0x0000fffff7f47800 in start_thread () from /lib64/libpthread.so.0
#7 0x0000fffff62f17dc in thread_start () from /lib64/libc.so.6

Thread 2 (Thread 0xfffff5d9c8c0 (LWP 3676911)):
#0 0x0000fffff7f4da8c in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1 0x0000fffff6566770 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#2 0x0000fffff75eef74 in ceph::logging::Log::entry (this=0xaaaaab520d80) at /home/rpmbuild/BUILD/ceph-17.0.0-9454-g23e8f062a2e/src/log/Log.cc:473
#3 0x0000fffff7f47800 in start_thread () from /lib64/libpthread.so.0
#4 0x0000fffff62f17dc in thread_start () from /lib64/libc.so.6

Thread 1 (Thread 0xfffff6066fc0 (LWP 3676910)):
#0 0x0000fffff7f4da8c in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1 0x0000fffff6566770 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6

#2 0x0000fffff789ecbc in dpdk::eal::start (this=this@entry=0xaaaaab4e0d68) at /home/rpmbuild/BUILD/ceph-17.0.0-9454-g23e8f062a2e/src/msg/async/dpdk/dpdk_rte.cc:179
#3 0x0000fffff7868024 in DPDKStack::spawn_worker(std::function<void ()>&&) (this=this@entry=0xaaaaab4e0d00, func=...) at /home/rpmbuild/BUILD/ceph-17.0.0-9454-g23e8f062a2e/src/msg/async/dpdk/DPDKStack.cc:265

#4 0x0000fffff759cd00 in NetworkStack::start (this=0xaaaaab4e0d00) at /home/rpmbuild/BUILD/ceph-17.0.0-9454-g23e8f062a2e/src/msg/async/Stack.cc:126
#5 0x0000fffff7545650 in AsyncMessenger::AsyncMessenger (this=0xaaaaac2a0900, cct=<optimized out>, name=..., type=..., mname=..., _nonce=3676910)
at /home/rpmbuild/BUILD/ceph-17.0.0-9454-g23e8f062a2e/src/msg/async/AsyncMessenger.cc:296
#6 0x0000fffff752ba90 in Messenger::create (cct=0xaaaaab570000, type="async+dpdk", name=..., lname=..., nonce=3676910) at /usr/include/c++/8/bits/char_traits.h:287
#7 0x0000aaaaaaac0c5c in MessengerClient::ready (this=0xffffffffe640, c=<optimized out>, jobs=<optimized out>, ops=<optimized out>, msg_len=<optimized out>) at /usr/include/c++/8/bits/basic_string.h:256
#8 0x0000aaaaaaaba200 in main (argc=<optimized out>, argv=<optimized out>) at /home/rpmbuild/BUILD/ceph-17.0.0-9454-g23e8f062a2e/src/test/msgr/perf_msgr_client.cc:211
(gdb)

Actions

Also available in: Atom PDF