Project

General

Profile

Bug #42861

Libceph-common.so needs to use private link attribute when including dpdk static library

Added by chunsong feng over 4 years ago. Updated over 2 years ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Libceph-common.so does not specify a link attribute containing the dpdk library,
dpdk global variables and functions will have a copy in libceph-common.so and the application.
It will cause the dpdk initialization to be abnormal´╝Ü

EAL: No available hugepages reported in hugepages-32768kB
EAL: No available hugepages reported in hugepages-64kB
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
[New Thread 0xffffb339dad0 (LWP 653944)]
[New Thread 0xffffb2b9cad0 (LWP 653945)]
EAL: Error - exiting with code: 1
Cause: No Ethernet ports - bye

$16 = (struct vfio_config (*)[64]) 0xffffbf589700 <vfio_cfgs>
(gdb) p *vfio_cfgs
$17 = {vfio_enabled = 1, vfio_container_fd = 9, vfio_active_groups = 0, vfio_iommu_type = 0x0, vfio_groups = {{group_num = -1,
fd = -1, devices = 0} <repeats 64 times>}, mem_maps = {lock = {sl = {locked = 0}, user = -1, count = 0}, n_maps = 0,
maps = {{addr = 0, iova = 0, len = 0} <repeats 256 times>}}}
(gdb) c
Continuing.
[New Thread 0xffffb339dad0 (LWP 653241)]
[New Thread 0xffffb2b9cad0 (LWP 653242)]
[Switching to Thread 0xffffb339dad0 (LWP 653241)]

Thread 6 "msgr-worker-0" hit Breakpoint 7, rte_exit (exit_code=1, format=0xffffb71afb38 "No Ethernet ports - bye\n")
at /home/chunsong/ceph/src/spdk/dpdk/lib/librte_eal/linux/eal/eal_debug.c:71
71 {
(gdb) p vfio_cfgs
$18 = {vfio_enabled = 0, vfio_container_fd = 0, vfio_active_groups = 0, vfio_iommu_type = 0x0, vfio_groups = {{group_num = 0,
fd = 0, devices = 0} <repeats 64 times>}, mem_maps = {lock = {sl = {locked = 0}, user = 0, count = 0}, n_maps = 0,
maps = {{addr = 0, iova = 0, len = 0} <repeats 256 times>}}}
(gdb) p &vfio_cfgs
$19 = (struct vfio_config (
)[64]) 0xaaaaaad22220 <vfio_cfgs>

(gdb) info breakpoints
Num Type Disp Enb Address What
1 hw watchpoint keep y vfio_cfgs0.vfio_enabled
2 breakpoint keep y <MULTIPLE>
breakpoint already hit 1 time
2.1 y 0x0000aaaaaab98398 in rte_vfio_enable
at /home/chunsong/ceph/src/spdk/dpdk/lib/librte_eal/linux/eal/eal_vfio.c:961
2.2 y 0x0000ffffb7050680 in rte_vfio_enable
at /home/chunsong/ceph/src/spdk/dpdk/lib/librte_eal/linux/eal/eal_vfio.c:961
3 breakpoint keep y <MULTIPLE>
3.1 y 0x0000aaaaaab98550 in rte_vfio_enable
at /home/chunsong/ceph/src/spdk/dpdk/lib/librte_eal/linux/eal/eal_vfio.c:1008
3.2 y 0x0000ffffb7050838 in rte_vfio_enable
at /home/chunsong/ceph/src/spdk/dpdk/lib/librte_eal/linux/eal/eal_vfio.c:1008

eal_repeat_error.rar (127 KB) chunsong feng, 11/18/2019 07:14 AM

History

#1 Updated by Greg Farnum over 4 years ago

  • Project changed from Ceph to RADOS

#2 Updated by chunsong feng about 4 years ago

The dpdk library initializes the EAL using constructors and global
variables, and cannot be re-initialized. Both test application and
libceph-common.so have a DPDK constructor that causes repeated
initialization. Link the ceph-common static library so that there
is only one constructor in test application.

#3 Updated by Kefu Chai about 4 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 31877

#4 Updated by chunsong feng over 3 years ago

The common_async_dpdk library depends on the dpdk rte_eal.a file. If the static library is used, the dependency is transferred to libceph-common. By default, libceph-common uses the PUBLIC keyword, the keyword PUBLIC determines that both the libceph-common application and libceph-common link to dpdk rte_eal.a. As a result, the global variable in dpdk rte_eal.a has two copies. Change common_async_dpdk to a shared library and use PRIVATE to link to dpdk rte_eal.a. The keyword PRIVATE determines that only common_async_dpdk link to dpdk rte_eal.a.

#5 Updated by chunsong feng over 3 years ago

The common_async_dpdk library depends on the dpdk rte_eal.a file. If the static library is used, the dependency is transferred to libceph-common. By default, libceph-common uses the PUBLIC keyword, the keyword PUBLIC determines that both the libceph-common application and libceph-common link to dpdk rte_eal.a. As a result, the global variable in dpdk rte_eal.a has two copies.
The DPDK asynchronous message module provides both static and shared libraries.let libcommon.a link to static libraries and libceph-common.so link to dynamic libraries. This method solves the problem that two copies exist when ceph-common is used.
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 86b4e0a..3fd06fa 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@ -437,10 +437,6 @ if(HAVE_QATZIP)
list(APPEND ceph_common_deps ${QATZIP_LIBRARIES})
endif()

if(WITH_DPDK)
list(APPEND ceph_common_deps common_async_dpdk)
endif()

if(WIN32)
list(APPEND ceph_common_deps ws2_32 mswsock bcrypt)
list(APPEND ceph_common_deps dlfcn_win32)
@ -465,6 +461,12 @ target_link_libraries(common ${ceph_common_deps})

add_library(ceph-common SHARED ${ceph_common_objs})
target_link_libraries(ceph-common ${ceph_common_deps})

+if(WITH_DPDK)
target_link_libraries(common common_async_dpdk_static)
+ target_link_libraries(ceph-common common_async_dpdk)
endif()
  1. appease dpkg-shlibdeps
    set_target_properties(ceph-common PROPERTIES
    SOVERSION 2
    diff --git a/src/msg/CMakeLists.txt b/src/msg/CMakeLists.txt
    index fada39b..7578884 100644
    --- a/src/msg/CMakeLists.txt
    ++ b/src/msg/CMakeLists.txt
    @ -55,11 +55,45 @ if(WITH_DPDK)
    async/dpdk/TCP.cc
    async/dpdk/UserspaceEvent.cc
    async/dpdk/ethernet.cc)
    - add_library(common_async_dpdk STATIC
    add_library(common_async_dpdk_static STATIC
    + ${async_dpdk_srcs})
    + target_link_libraries(common_async_dpdk_static PRIVATE
    + dpdk::dpdk)

    add_library(common_async_dpdk SHARED
    ${async_dpdk_srcs})
    target_link_libraries(common_async_dpdk PRIVATE
    dpdk::dpdk) # Stack.cc includes DPDKStack.h, which includes rte_config.h indirectly
    target_include_directories(common-msg-objs PRIVATE
    $<TARGET_PROPERTY:dpdk::dpdk,INTERFACE_INCLUDE_DIRECTORIES>)
    + # appease dpkg-shlibdeps
    + set_target_properties(common_async_dpdk PROPERTIES
    + SOVERSION 1
    + SKIP_RPATH TRUE)
    + if(NOT APPLE AND NOT FREEBSD)
    + # Apple uses Mach-O, not ELF. so this option does not apply to APPLE.
    + #
    + # prefer the local symbol definitions when binding references to global
    + # symbols. otherwise we could reference the symbols defined by the application
    + # with the same name, instead of using the one defined in libcommon_async_dpdk.
    + # in other words, we require libcommon_async_dpdk to use local symbols, even if redefined
    + # in application".
    + set_property(
    + TARGET common_async_dpdk
    + APPEND APPEND_STRING
    + PROPERTY LINK_FLAGS "-Wl,-Bsymbolic -Wl,-Bsymbolic-functions")
    + endif()

    if(MINGW)
    + install(
    + TARGETS common_async_dpdk
    + RUNTIME
    + DESTINATION ${CEPH_INSTALL_PKGLIBDIR})
    + else()
    + install(
    + TARGETS common_async_dpdk
    + LIBRARY
    + DESTINATION ${CEPH_INSTALL_PKGLIBDIR})
    + endif()
    endif(WITH_DPDK)

#6 Updated by chunsong feng over 2 years ago

link with static lib ,can't find ethernet port
[root@ceph1 aarch64-openEuler-linux-gnu]# cat ./src/test/msgr/CMakeFiles/ceph_test_async_networkstack.dir/link.txt
/usr/bin/c++ rdynamic pie CMakeFiles/ceph_test_async_networkstack.dir/test_async_networkstack.cc.o ../CMakeFiles/unit-main.dir/unit.cc.o -o ../../../bin/ceph_test_async_networkstack -Wl,-rpath,/home/rpmbuild/BUILD/ceph-17.0.0-8082-gedae9200104/aarch64-openEuler-linux-gnu/lib: ../../../lib/libglobal.a /usr/lib64/libcrypto.so /usr/lib64/libblkid.so -ldl ../../../lib/libgmock_main.a ../../../lib/libgmock.a ../../../lib/libgtest.a -lpthread -ldl ../../../lib/libceph-common.so.2 /usr/lib64/libcrypto.so /usr/lib64/libblkid.so ../../../lib/libjson_spirit.a ../../../lib/libcommon_utf8.a ../../../lib/liberasure_code.a -ldl ../../../lib/libcrc32.a ../../../lib/libarch.a ../../../boost/lib/libboost_thread.a ../../../boost/lib/libboost_chrono.a ../../../boost/lib/libboost_atomic.a ../../../boost/lib/libboost_system.a ../../../boost/lib/libboost_random.a ../../../boost/lib/libboost_program_options.a ../../../boost/lib/libboost_date_time.a ../../../boost/lib/libboost_iostreams.a ../../../boost/lib/libboost_regex.a /usr/local/lib64/libfmt.a /usr/lib64/libudev.so /usr/lib64/libibverbs.so /usr/lib64/librdmacm.so /usr/lib64/libz.so ../../../lib/libcommon_async_dpdk.a /usr/lib64/librte_bus_pci.a /usr/lib64/librte_bus_vdev.a /usr/lib64/librte_cfgfile.a /usr/lib64/librte_cmdline.a /usr/lib64/librte_eal.a /usr/lib64/librte_ethdev.a /usr/lib64/librte_hash.a /usr/lib64/librte_kvargs.a /usr/lib64/librte_mbuf.a /usr/lib64/librte_mempool.a /usr/lib64/librte_mempool_ring.a /usr/lib64/librte_net.a /usr/lib64/librte_pmd_hns3.a /usr/lib64/librte_pmd_hinic.a /usr/lib64/librte_pmd_af_packet.a /usr/lib64/librte_pmd_bnxt.a /usr/lib64/librte_pmd_bond.a /usr/lib64/librte_pmd_cxgbe.a /usr/lib64/librte_pmd_e1000.a /usr/lib64/librte_pmd_ena.a /usr/lib64/librte_pmd_enic.a /usr/lib64/librte_pmd_i40e.a /usr/lib64/librte_pmd_ixgbe.a /usr/lib64/librte_pmd_nfp.a /usr/lib64/librte_pmd_qede.a /usr/lib64/librte_pmd_ring.a /usr/lib64/librte_pmd_vmxnet3_uio.a /usr/lib64/librte_pci.a /usr/lib64/librte_ring.a /usr/lib64/librte_timer.a -lnuma /usr/lib64/librt.so -lresolv -lpthread
[root@ceph1 aarch64-openEuler-linux-gnu]# cd bin/
[root@ceph1 bin]# CEPH_CONF=/home/ceph.conf ./ceph_test_async_networkstack
[==========] Running 12 tests from 1 test suite.
[--------
] Global test environment set-up.
[----------] 12 tests from NetworkStack/NetworkWorkerTest
[ RUN ] NetworkStack/NetworkWorkerTest.SimpleTest/0
SetUp start set up dpdk
EAL: Detected 128 lcore(s)
EAL: Detected 4 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte_client.admin/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Error - exiting with code: 1
Cause: 2021-10-14T20:01:30.610+0800 fffcd2fddb40 2 Event(0xaaad49450188 nevent=5000 time_id=1).set_owner center_id=0 owner=281461336693568No Ethernet ports - bye

#7 Updated by chunsong feng over 2 years ago

link with shared libs,it can find ethernet,and test passed.
[root@ceph1 aarch64-openEuler-linux-gnu]# cat ./src/test/msgr/CMakeFiles/ceph_test_async_networkstack.dir/link.txt
/usr/bin/c++ -rdynamic -pie CMakeFiles/ceph_test_async_networkstack.dir/test_async_networkstack.cc.o ../CMakeFiles/unit-main.dir/unit.cc.o -o ../../../bin/ceph_test_async_networkstack -Wl,-rpath,/home/rpmbuild/BUILD/ceph-17.0.0-8082-gedae9200104/aarch64-openEuler-linux-gnu/lib: ../../../lib/libglobal.a /usr/lib64/libcrypto.so /usr/lib64/libblkid.so -ldl ../../../lib/libgmock_main.a ../../../lib/libgmock.a ../../../lib/libgtest.a -lpthread -ldl ../../../lib/libceph-common.so.2 /usr/lib64/libcrypto.so /usr/lib64/libblkid.so ../../../lib/libjson_spirit.a ../../../lib/libcommon_utf8.a ../../../lib/liberasure_code.a -ldl ../../../lib/libcrc32.a ../../../lib/libarch.a ../../../boost/lib/libboost_thread.a ../../../boost/lib/libboost_chrono.a ../../../boost/lib/libboost_atomic.a ../../../boost/lib/libboost_system.a ../../../boost/lib/libboost_random.a ../../../boost/lib/libboost_program_options.a ../../../boost/lib/libboost_date_time.a ../../../boost/lib/libboost_iostreams.a ../../../boost/lib/libboost_regex.a /usr/local/lib64/libfmt.a /usr/lib64/libudev.so /usr/lib64/libibverbs.so /usr/lib64/librdmacm.so /usr/lib64/libz.so ../../../lib/libcommon_async_dpdk.a /usr/lib64/librte_bus_pci.so /usr/lib64/librte_bus_vdev.so /usr/lib64/librte_cfgfile.so /usr/lib64/librte_cmdline.so /usr/lib64/librte_eal.so /usr/lib64/librte_ethdev.so /usr/lib64/librte_hash.so /usr/lib64/librte_kvargs.so /usr/lib64/librte_mbuf.so /usr/lib64/librte_mempool.so /usr/lib64/librte_mempool_ring.so /usr/lib64/librte_net.so /usr/lib64/librte_pmd_hns3.so /usr/lib64/librte_pmd_hinic.so /usr/lib64/librte_pmd_af_packet.so /usr/lib64/librte_pmd_bnxt.so /usr/lib64/librte_pmd_bond.so /usr/lib64/librte_pmd_cxgbe.so /usr/lib64/librte_pmd_e1000.so /usr/lib64/librte_pmd_ena.so /usr/lib64/librte_pmd_enic.so /usr/lib64/librte_pmd_i40e.so /usr/lib64/librte_pmd_ixgbe.so /usr/lib64/librte_pmd_nfp.so /usr/lib64/librte_pmd_qede.so /usr/lib64/librte_pmd_ring.so /usr/lib64/librte_pmd_vmxnet3_uio.so /usr/lib64/librte_pci.so /usr/lib64/librte_ring.so /usr/lib64/librte_timer.so -lnuma /usr/lib64/librt.so -lresolv -lpthread

[root@ceph1 bin]# CEPH_CONF=/home/ceph.conf ./ceph_test_async_networkstack
[==========] Running 12 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 12 tests from NetworkStack/NetworkWorkerTest
[ RUN ] NetworkStack/NetworkWorkerTest.SimpleTest/0
SetUp start set up dpdk
EAL: Detected 128 lcore(s)
EAL: Detected 4 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte_client.admin/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:05:00.0 on NUMA socket 0
EAL: probe driver: 19e5:1822 net_hinic
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL: probe driver: 19e5:1822 net_hinic
EAL: PCI device 0000:07:00.0 on NUMA socket 0
EAL: probe driver: 19e5:1822 net_hinic
EAL: PCI device 0000:08:00.0 on NUMA socket 0
EAL: probe driver: 19e5:1822 net_hinic
EAL: PCI device 0000:7d:00.0 on NUMA socket 0
EAL: probe driver: 19e5:a222 net_hns3
EAL: PCI device 0000:7d:00.1 on NUMA socket 0
EAL: probe driver: 19e5:a221 net_hns3
EAL: PCI device 0000:7d:00.2 on NUMA socket 0
EAL: probe driver: 19e5:a222 net_hns3
EAL: PCI device 0000:7d:00.3 on NUMA socket 0
EAL: probe driver: 19e5:a221 net_hns3
EAL: PCI device 0000:7d:01.0 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: PCI device 0000:7d:01.1 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: PCI device 0000:7d:01.2 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: using IOMMU type 8 (No-IOMMU)
EAL: PCI device 0000:7d:01.3 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: PCI device 0000:7d:02.6 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: PCI device 0000:7d:02.7 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: PCI device 0000:7d:03.0 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: PCI device 0000:7d:03.1 on NUMA socket 0
EAL: probe driver: 19e5:a22f net_hns3_vf
EAL: PCI device 0000:bd:00.0 on NUMA socket 2
EAL: probe driver: 19e5:a222 net_hns3
EAL: PCI device 0000:bd:00.1 on NUMA socket 2
EAL: probe driver: 19e5:a221 net_hns3
EAL: PCI device 0000:bd:00.2 on NUMA socket 2
EAL: probe driver: 19e5:a222 net_hns3
EAL: PCI device 0000:bd:00.3 on NUMA socket 2
EAL: probe driver: 19e5:a221 net_hns3
2021-10-14T19:52:54.314+0800 fffc8afddb30 2 Event(0xaaae018934e8 nevent=5000 time_id=1).set_owner center_id=0 owner=281460128734000
2021-10-14T19:52:54.314+0800 fffc8a7cdb30 2 Event(0xaaae0189a648 nevent=5000 time_id=1).set_owner center_id=1 owner=281460120279856
2021-10-14T19:52:54.314+0800 fffc8a7cdb30 10 stack operator() starting
2021-10-14T19:52:54.314+0800 fffc8afddb30 10 stack operator() starting
2021-10-14T19:52:54.314+0800 fffc8afddb30 1 dpdk init_port_start LRO is off
2021-10-14T19:52:54.314+0800 fffc8afddb30 1 dpdk init_port_start RX checksum offload supported
2021-10-14T19:52:54.314+0800 fffc8afddb30 1 dpdk init_port_start TX ip checksum offload supported
2021-10-14T19:52:54.314+0800 fffc8afddb30 1 dpdk init_port_start TX TCP checksum offload supported
2021-10-14T19:52:54.314+0800 fffc8afddb30 1 dpdk init_port_start Port 0 init ...
2021-10-14T19:52:54.314+0800 fffc8afddb30 1 dpdk init_port_start done.

Also available in: Atom PDF