Project

General

Profile

Actions

Bug #43073

open

ceph_perf_msgr_client crashed

Added by yong xing over 4 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
AsyncMessenger
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Messengers
Backport:
v14.2.4
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph_perf_msgr_client crashs as follows,

#0 GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007fffec59f801 in _GI_abort () at abort.c:79
#2 0x00007fffee6f1757 in ceph::
_ceph_assert_fail (assertion=0x7fffef04d50b "messenger->auth_client", file=0x7fffef04c6b0 "/root/code/ceph/src/msg/async/ProtocolV2.cc", line=1705,
func=0x7fffef051680 <ProtocolV2::send_auth_request(std::vector<unsigned int, std::allocator<unsigned int> >&)::__PRETTY_FUNCTION
> "Ct<ProtocolV2>* ProtocolV2::send_auth_request(std::vector<unsigned int>&)") at /root/code/ceph/src/common/assert.cc:73
#3 0x00007fffee6f180a in ceph::__ceph_assert_fail (ctx=...) at /root/code/ceph/src/common/assert.cc:78
#4 0x00007fffeea89137 in ProtocolV2::send_auth_request (this=0x555556cbf080, allowed_methods=std::vector of length 0, capacity 0) at /root/code/ceph/src/msg/async/ProtocolV2.cc:1705
#5 0x00007fffeea9b029 in ProtocolV2::send_auth_request (this=0x555556cbf080) at /root/code/ceph/src/msg/async/ProtocolV2.h:215
#6 0x00007fffeea88ede in ProtocolV2::post_client_banner_exchange (this=0x555556cbf080) at /root/code/ceph/src/msg/async/ProtocolV2.cc:1699
#7 0x00007fffeeab0e64 in CtFun<ProtocolV2>::_call<>(ProtocolV2*, std::integer_sequence<unsigned long>) const (this=0x555556cbf540, foo=0x555556cbf080)
at /root/code/ceph/src/msg/async/Protocol.h:37
#8 0x00007fffeeab0df6 in CtFun<ProtocolV2>::call (this=0x555556cbf540, foo=0x555556cbf080) at /root/code/ceph/src/msg/async/Protocol.h:45
#9 0x00007fffeea76c1c in ProtocolV2::run_continuation (this=0x555556cbf080, continuation=...) at /root/code/ceph/src/msg/async/ProtocolV2.cc:45
#10 0x00007fffeea7dbaa in ProtocolV2::<lambda(char*, int)>::operator()(char , int) const (_closure=0x555556026c98, buffer=0x555556032140 "ceph v2\n\020", r=0)
at /root/code/ceph/src/msg/async/ProtocolV2.cc:719
#11 0x00007fffeea98b54 in std::_Function_handler<void(char
, long int), ProtocolV2::read(CONTINUATION_RXBPTR_TYPE<ProtocolV2>&, rx_buffer_t&&)::<lambda(char*, int)> >::_M_invoke(const std::_Any_data &, char &&, long &&) (_functor=..., _args#0=@0x7fffe73b3590: 0x555556032140 "ceph v2\n\020", __args#1=@0x7fffe73b3588: 0) at /usr/include/c++/7/bits/std_function.h:316
#12 0x00007fffeea3afbd in std::function<void (char
, long)>::operator()(char*, long) const (this=0x555556026c98, __args#0=0x555556032140 "ceph v2\n\020", __args#1=0)
at /usr/include/c++/7/bits/std_function.h:706
#13 0x00007fffeea36119 in AsyncConnection::process (this=0x555556026880) at /root/code/ceph/src/msg/async/AsyncConnection.cc:450
#14 0x00007fffeea3a1b2 in C_handle_read::do_request (this=0x55555600c4e0, fd_or_id=12) at /root/code/ceph/src/msg/async/AsyncConnection.cc:71
#15 0x00007fffeeab464e in EventCenter::process_events (this=0x55555600ebc0, timeout_microseconds=10000000, working_dur=0x7fffe73b3948) at /root/code/ceph/src/msg/async/Event.cc:415
#16 0x00007fffeeac1d88 in NetworkStack::<lambda()>::operator()(void) const (
_closure=0x5555560c0458) at /root/code/ceph/src/msg/async/Stack.cc:53
#17 0x00007fffeeac325b in std::_Function_handler<void(), NetworkStack::add_thread(unsigned int)::<lambda()> >::_M_invoke(const std::_Any_data &) (_functor=...)
at /usr/include/c++/7/bits/std_function.h:316
#18 0x00007fffeeac05de in std::function<void ()>::operator()() const (this=0x5555560c0458) at /usr/include/c++/7/bits/std_function.h:706
#19 0x00007fffeeabfe7b in std::
_invoke_impl<void, std::function<void ()>>(std::__invoke_other, std::function<void ()>&&) (_f=...) at /usr/include/c++/7/bits/invoke.h:60
#20 0x00007fffeeabf608 in std::
_invoke<std::function<void ()>>(std::function<void ()>&&) (__fn=...) at /usr/include/c++/7/bits/invoke.h:95
#21 0x00007fffeeac153e in std::thread::_Invoker<std::tuple<std::function<void ()> > >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x5555560c0458) at /usr/include/c++/7/thread:234
#22 0x00007fffeeac150f in std::thread::_Invoker<std::tuple<std::function<void ()> > >::operator()() (this=0x5555560c0458) at /usr/include/c++/7/thread:243
#23 0x00007fffeeac14ee in std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void ()> > > >::_M_run() (this=0x5555560c0450) at /usr/include/c++/7/thread:186
#24 0x00007fffecc2557f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#25 0x00007ffff7bbd6db in start_thread (arg=0x7fffe73b5700) at pthread_create.c:463
#26 0x00007fffec68088f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) f 4
#4 0x00007fffeea89137 in ProtocolV2::send_auth_request (this=0x555556cbf080, allowed_methods=std::vector of length 0, capacity 0) at /root/code/ceph/src/msg/async/ProtocolV2.cc:1705
1705 ceph_assert(messenger->auth_client);
(gdb) p messenger
$2 = (AsyncMessenger *) 0x555556cce000
(gdb) p messenger.auth_client
$3 = (AuthClient *) 0x0

ceph/src/msg/async/ProtocolV2.cc

1702 CtPtr ProtocolV2::send_auth_request(std::vector<uint32_t> &allowed_methods) {
1703 ldout(cct, 20) << func << " peer_type " << (int)connection->peer_type
1704 << " auth_client " << messenger->auth_client << dendl;
1705 ceph_assert(messenger->auth_client);

Actions #1

Updated by chunsong feng over 2 years ago

link with static lib ,can't find ethernet port
[root@ceph1 aarch64-openEuler-linux-gnu]# cat ./src/test/msgr/CMakeFiles/ceph_test_async_networkstack.dir/link.txt
/usr/bin/c++ rdynamic -pie CMakeFiles/ceph_test_async_networkstack.dir/test_async_networkstack.cc.o ../CMakeFiles/unit-main.dir/unit.cc.o -o ../../../bin/ceph_test_async_networkstack -Wl,-rpath,/home/rpmbuild/BUILD/ceph-17.0.0-8082-gedae9200104/aarch64-openEuler-linux-gnu/lib: ../../../lib/libglobal.a /usr/lib64/libcrypto.so /usr/lib64/libblkid.so -ldl ../../../lib/libgmock_main.a ../../../lib/libgmock.a ../../../lib/libgtest.a -lpthread -ldl ../../../lib/libceph-common.so.2 /usr/lib64/libcrypto.so /usr/lib64/libblkid.so ../../../lib/libjson_spirit.a ../../../lib/libcommon_utf8.a ../../../lib/liberasure_code.a -ldl ../../../lib/libcrc32.a ../../../lib/libarch.a ../../../boost/lib/libboost_thread.a ../../../boost/lib/libboost_chrono.a ../../../boost/lib/libboost_atomic.a ../../../boost/lib/libboost_system.a ../../../boost/lib/libboost_random.a ../../../boost/lib/libboost_program_options.a ../../../boost/lib/libboost_date_time.a ../../../boost/lib/libboost_iostreams.a ../../../boost/lib/libboost_regex.a /usr/local/lib64/libfmt.a /usr/lib64/libudev.so /usr/lib64/libibverbs.so /usr/lib64/librdmacm.so /usr/lib64/libz.so ../../../lib/libcommon_async_dpdk.a /usr/lib64/librte_bus_pci.a /usr/lib64/librte_bus_vdev.a /usr/lib64/librte_cfgfile.a /usr/lib64/librte_cmdline.a /usr/lib64/librte_eal.a /usr/lib64/librte_ethdev.a /usr/lib64/librte_hash.a /usr/lib64/librte_kvargs.a /usr/lib64/librte_mbuf.a /usr/lib64/librte_mempool.a /usr/lib64/librte_mempool_ring.a /usr/lib64/librte_net.a /usr/lib64/librte_pmd_hns3.a /usr/lib64/librte_pmd_hinic.a /usr/lib64/librte_pmd_af_packet.a /usr/lib64/librte_pmd_bnxt.a /usr/lib64/librte_pmd_bond.a /usr/lib64/librte_pmd_cxgbe.a /usr/lib64/librte_pmd_e1000.a /usr/lib64/librte_pmd_ena.a /usr/lib64/librte_pmd_enic.a /usr/lib64/librte_pmd_i40e.a /usr/lib64/librte_pmd_ixgbe.a /usr/lib64/librte_pmd_nfp.a /usr/lib64/librte_pmd_qede.a /usr/lib64/librte_pmd_ring.a /usr/lib64/librte_pmd_vmxnet3_uio.a /usr/lib64/librte_pci.a /usr/lib64/librte_ring.a /usr/lib64/librte_timer.a -lnuma /usr/lib64/librt.so -lresolv -lpthread
[root@ceph1 aarch64-openEuler-linux-gnu]# cd bin/
[root@ceph1 bin]# CEPH_CONF=/home/ceph.conf ./ceph_test_async_networkstack
[==========] Running 12 tests from 1 test suite.
[---------
] Global test environment set-up.
[----------] 12 tests from NetworkStack/NetworkWorkerTest
[ RUN ] NetworkStack/NetworkWorkerTest.SimpleTest/0
SetUp start set up dpdk
EAL: Detected 128 lcore(s)
EAL: Detected 4 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte_client.admin/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Error - exiting with code: 1
Cause: 2021-10-14T20:01:30.610+0800 fffcd2fddb40 2 Event(0xaaad49450188 nevent=5000 time_id=1).set_owner center_id=0 owner=281461336693568No Ethernet ports - bye

Actions #2

Updated by chunsong feng over 2 years ago

link with shared libs,it can find ethernet,and test passed.
[root@ceph1 aarch64-openEuler-linux-gnu]# cat ./src/test/msgr/CMakeFiles/ceph_test_async_networkstack.dir/link.txt
/usr/bin/c++ -rdynamic -pie CMakeFiles/ceph_test_async_networkstack.dir/test_async_networkstack.cc.o ../CMakeFiles/unit-main.dir/unit.cc.o -o ../../../bin/ceph_test_async_networkstack -Wl,-rpath,/home/rpmbuild/BUILD/ceph-17.0.0-8082-gedae9200104/aarch64-openEuler-linux-gnu/lib: ../../../lib/libglobal.a /usr/lib64/libcrypto.so /usr/lib64/libblkid.so -ldl ../../../lib/libgmock_main.a ../../../lib/libgmock.a ../../../lib/libgtest.a -lpthread -ldl ../../../lib/libceph-common.so.2 /usr/lib64/libcrypto.so /usr/lib64/libblkid.so ../../../lib/libjson_spirit.a ../../../lib/libcommon_utf8.a ../../../lib/liberasure_code.a -ldl ../../../lib/libcrc32.a ../../../lib/libarch.a ../../../boost/lib/libboost_thread.a ../../../boost/lib/libboost_chrono.a ../../../boost/lib/libboost_atomic.a ../../../boost/lib/libboost_system.a ../../../boost/lib/libboost_random.a ../../../boost/lib/libboost_program_options.a ../../../boost/lib/libboost_date_time.a ../../../boost/lib/libboost_iostreams.a ../../../boost/lib/libboost_regex.a /usr/local/lib64/libfmt.a /usr/lib64/libudev.so /usr/lib64/libibverbs.so /usr/lib64/librdmacm.so /usr/lib64/libz.so ../../../lib/libcommon_async_dpdk.a /usr/lib64/librte_bus_pci.so /usr/lib64/librte_bus_vdev.so /usr/lib64/librte_cfgfile.so /usr/lib64/librte_cmdline.so /usr/lib64/librte_eal.so /usr/lib64/librte_ethdev.so /usr/lib64/librte_hash.so /usr/lib64/librte_kvargs.so /usr/lib64/librte_mbuf.so /usr/lib64/librte_mempool.so /usr/lib64/librte_mempool_ring.so /usr/lib64/librte_net.so /usr/lib64/librte_pmd_hns3.so /usr/lib64/librte_pmd_hinic.so /usr/lib64/librte_pmd_af_packet.so /usr/lib64/librte_pmd_bnxt.so /usr/lib64/librte_pmd_bond.so /usr/lib64/librte_pmd_cxgbe.so /usr/lib64/librte_pmd_e1000.so /usr/lib64/librte_pmd_ena.so /usr/lib64/librte_pmd_enic.so /usr/lib64/librte_pmd_i40e.so /usr/lib64/librte_pmd_ixgbe.so /usr/lib64/librte_pmd_nfp.so /usr/lib64/librte_pmd_qede.so /usr/lib64/librte_pmd_ring.so /usr/lib64/librte_pmd_vmxnet3_uio.so /usr/lib64/librte_pci.so /usr/lib64/librte_ring.so /usr/lib64/librte_timer.so -lnuma /usr/lib64/librt.so -lresolv -lpthread

[root@ceph1 bin]# CEPH_CONF=/home/ceph.conf ./ceph_test_async_networkstack
[==========] Running 12 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 12 tests from NetworkStack/NetworkWorkerTest
[ RUN ] NetworkStack/NetworkWorkerTest.SimpleTest/0
SetUp start set up dpdk
EAL: Detected 128 lcore(s)
EAL: Detected 4 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte_client.admin/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:05:00.0 on NUMA socket 0
EAL: probe driver: 19e5:1822 net_hinic
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL: probe driver: 19e5:1822 net_hinic
EAL: PCI device 0000:07:00.0 on NUMA socket 0
EAL: probe driver: 19e5:1822 net_hinic
EAL: PCI device 0000:08:00.0 on NUMA socket 0
EAL: probe driver: 19e5:1822 net_hinic
EAL: PCI device 0000:7d:00.0 on NUMA socket 0
EAL: probe driver: 19e5:a222 net_hns3
EAL: PCI device 0000:7d:00.1 on NUMA socket 0
EAL: probe driver: 19e5:a221 net_hns3
EAL: PCI device 0000:7d:00.2 on NUMA socket 0
EAL: probe driver: 19e5:a222 net_hns3
EAL: PCI device 0000:7d:00.3 on NUMA socket 0
EAL: probe driver: 19e5:a221 net_hns3
EAL: PCI device 0000:7d:01.0 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: PCI device 0000:7d:01.1 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: PCI device 0000:7d:01.2 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: using IOMMU type 8 (No-IOMMU)
EAL: PCI device 0000:7d:01.3 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: PCI device 0000:7d:02.6 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: PCI device 0000:7d:02.7 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: PCI device 0000:7d:03.0 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: PCI device 0000:7d:03.1 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: PCI device 0000:bd:00.0 on NUMA socket 2
EAL: probe driver: 19e5:a222 net_hns3
EAL: PCI device 0000:bd:00.1 on NUMA socket 2
EAL: probe driver: 19e5:a221 net_hns3
EAL: PCI device 0000:bd:00.2 on NUMA socket 2
EAL: probe driver: 19e5:a222 net_hns3
EAL: PCI device 0000:bd:00.3 on NUMA socket 2
EAL: probe driver: 19e5:a221 net_hns3
2021-10-14T19:52:54.314+0800 fffc8afddb30 2 Event(0xaaae018934e8 nevent=5000 time_id=1).set_owner center_id=0 owner=281460128734000
2021-10-14T19:52:54.314+0800 fffc8a7cdb30 2 Event(0xaaae0189a648 nevent=5000 time_id=1).set_owner center_id=1 owner=281460120279856
2021-10-14T19:52:54.314+0800 fffc8a7cdb30 10 stack operator() starting
2021-10-14T19:52:54.314+0800 fffc8afddb30 10 stack operator() starting
2021-10-14T19:52:54.314+0800 fffc8afddb30 1 dpdk init_port_start LRO is off
2021-10-14T19:52:54.314+0800 fffc8afddb30 1 dpdk init_port_start RX checksum offload supported
2021-10-14T19:52:54.314+0800 fffc8afddb30 1 dpdk init_port_start TX ip checksum offload supported
2021-10-14T19:52:54.314+0800 fffc8afddb30 1 dpdk init_port_start TX TCP checksum offload supported
2021-10-14T19:52:54.314+0800 fffc8afddb30 1 dpdk init_port_start Port 0 init ...
2021-10-14T19:52:54.314+0800 fffc8afddb30 1 dpdk init_port_start done.

Actions

Also available in: Atom PDF