Project

General

Profile

Actions

Bug #64184

open

test_bn.py -v -a kafka_test: Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed

Added by Casey Bodley 3 months ago. Updated about 1 month ago.

Status:
New
Priority:
Urgent
Target version:
-
% Done:

0%

Source:
Tags:
notifications kafka
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2024-01-25T03:01:00.979 INFO:tasks.notification_tests:Running bucket-notifications-tests...
2024-01-25T03:01:00.979 DEBUG:teuthology.orchestra.run.smithi046:bucket notification tests against different endpoints> BNTESTS_CONF=/home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/bn-tests.client.0.conf /home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/virtualenv/bin/python -m nose -s /home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/test_bn.py -v -a kafka_test
2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout:Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed: previous_prio == -1 || (previous_prio >= fifo_min_prio && previous_prio <= fifo_max_prio)
2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout:*** Caught signal (Aborted) **
2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout: in thread 7f69ab974640 thread_name:kafka_manager
2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout: ceph version 19.0.0-814-g1a8bb77b (1a8bb77be00267ce596e60f3b1141a4463aab767) squid (dev)
2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout: 1: /lib64/libc.so.6(+0x54db0) [0x7f6add654db0]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 2: /lib64/libc.so.6(+0xa154c) [0x7f6add6a154c]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 3: raise()
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 4: abort()
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 5: /lib64/libc.so.6(+0x29130) [0x7f6add629130]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 6: /lib64/libc.so.6(+0x4daf7) [0x7f6add64daf7]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 7: /lib64/libc.so.6(+0xa7d18) [0x7f6add6a7d18]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 8: (std::_Function_handler<void (int), RGWPubSubKafkaEndpoint::send_to_completion_async(ceph::common::CephContext*, rgw_pubsub_s3_event const&, optional_yield)::{lambda(int)#1}>::_M_invoke(std::_Any_data const&, int&&)+0x95) [0x558c16fa84e5]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 9: (rgw::kafka::message_callback(rd_kafka_s*, rd_kafka_message_s const*, void*)+0xef) [0x558c170cc1ff]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 10: /lib64/librdkafka.so.1(+0x256ef) [0x7f6ade4776ef]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 11: /lib64/librdkafka.so.1(+0x5b862) [0x7f6ade4ad862]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 12: rd_kafka_poll()
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 13: (rgw::kafka::Manager::run()+0x350) [0x558c170ce5a0]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 14: /lib64/libstdc++.so.6(+0xdb924) [0x7f6addadb924]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 15: /lib64/libc.so.6(+0x9f802) [0x7f6add69f802]
2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 16: /lib64/libc.so.6(+0x3f450) [0x7f6add63f450]

Related issues 1 (1 open0 closed)

Related to rgw - Bug #63314: kafka crashed during message callback in teuthologyPending BackportYuval Lifshitz

Actions
Actions #1

Updated by Casey Bodley 3 months ago

  • Related to Bug #63314: kafka crashed during message callback in teuthology added
Actions #3

Updated by Casey Bodley 2 months ago

@Yuval maybe it would make sense to split the rgw/notifications suite into two separate jobs for kafka and amqp, so we can hopefully get clean runs from amqp?

Actions #4

Updated by Yuval Lifshitz 2 months ago

another crash trace from kafka test:

#0  0x00007f45788a154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007f4578854d06 in raise () from /lib64/libc.so.6
#2  0x00007f45788287f3 in abort () from /lib64/libc.so.6
#3  0x00007f457ba161f4 in tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem) [clone .cold] () from /lib64/libtcmalloc.so.4
#4  0x00007f457ba1a7e3 in (anonymous namespace)::InvalidFree(void*) () from /lib64/libtcmalloc.so.4
#5  0x000055b1f503a2d2 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x55b1f80cdf10) at /usr/include/c++/11/bits/shared_ptr_base.h:168
#6  0x000055b1f53ef08b in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:705
#7  std::__shared_ptr<std::__detail::_NFA<std::__cxx11::regex_traits<char> > const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:1154
#8  std::shared_ptr<std::__detail::_NFA<std::__cxx11::regex_traits<char> > const>::~shared_ptr (this=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/shared_ptr.h:122
#9  std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::~basic_regex (this=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/regex.h:535
#10 rgw::parse_url_authority (url=..., host="localhost", user="", password="") at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_url.cc:33
#11 0x000055b1f553e8d9 in rgw::kafka::Manager::connect (this=0x55b1f71623c0, broker="localhost", url=..., use_ssl=<optimized out>, verify_ssl=<optimized out>, ca_location=..., mechanism=...)
    at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_kafka.cc:558
#12 0x000055b1f541b763 in rgw::kafka::connect (mechanism=..., ca_location=..., verify_ssl=true, use_ssl=<optimized out>, url="kafka://localhost", broker="localhost")
    at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_kafka.cc:692
#13 RGWPubSubKafkaEndpoint::RGWPubSubKafkaEndpoint (_cct=<optimized out>, args=..., _topic=..., _endpoint="kafka://localhost", this=0x55b203d51c80)
    at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/driver/rados/rgw_pubsub_push.cc:306
#14 RGWPubSubEndpoint::create (endpoint="kafka://localhost", topic=..., args=..., cct=<optimized out>) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/driver/rados/rgw_pubsub_push.cc:394
#15 0x000055b1f540c0dc in rgw::notify::publish_commit (obj=0x55b1fe586f00, size=1476742, mtime=..., etag="6ba994fd40c8086e7ecff9fc89b29dfb", version="", event_type=rgw::notify::ObjectRemovedDelete, res=..., dpp=<optimized out>)
    at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/driver/rados/rgw_notify.cc:1129
#16 0x000055b1f54b85ef in rgw::sal::RadosNotification::publish_commit (this=this@entry=0x55b204a14360, dpp=dpp@entry=0x55b200d52480, size=size@entry=1476742, mtime=..., etag="6ba994fd40c8086e7ecff9fc89b29dfb", version="")
    at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/driver/rados/rgw_sal_rados.cc:2850
#17 0x000055b1f529250f in RGWDeleteObj::execute (this=<optimized out>, y=...) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_op.cc:5398
#18 0x000055b1f512bca2 in rgw_process_authenticated (handler=<optimized out>, op=@0x7f4445fb93e0: 0x55b200d52480, req=<optimized out>, s=<optimized out>, y=..., driver=0x55b1f7f91c40, skip_retarget=false)
    at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_process.cc:255
#19 0x000055b1f512ec4d in process_request (penv=..., req=0x7f4445fba4a0, frontend_prefix=..., client_io=0x7f4445fba550, yield=..., scheduler=0x55b1f82a2458, user=0x7f4445fba870, latency=0x7f4445fba478, http_ret=0x7f4445fba474)
    at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_process.cc:389
#20 0x000055b1f59b9280 in (anonymous namespace)::handle_connection<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor> >(boost::asio::io_context&, RGWProcessEnv&, boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor>&, rgw::basic_timeout_timer<ceph::coarse_mono_clock, boost::asio::any_io_executor, (anonymous namespace)::Connection>&, unsigned long, boost::beast::flat_static_buffer<65536ul>&, bool, ceph::async::SharedMutex<boost::asio::any_io_executor>&, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::system::error_code&, spawn::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::any_io_executor> >) [clone .constprop.0] (context=..., env=..., stream=..., timeout=..., header_limit=16384, buffer=..., pause_mutex=..., scheduler=0x55b1f82a2458, uri_prefix="", ec=..., yield=..., 
    is_ssl=false) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_asio_frontend.cc:290
#21 0x000055b1f5091fc4 in operator() (yield=..., __closure=0x55b20394c458) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_asio_frontend.cc:1061
#22 operator() (c=..., __closure=<optimized out>) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/spawn/include/spawn/impl/spawn.hpp:390
#23 std::__invoke_impl<boost::context::continuation, spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)>&, boost::context::continuation> (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#24 std::__invoke<spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)>&, boost::context::continuation> (__fn=...) at /usr/include/c++/11/bits/invoke.h:97
#25 std::invoke<spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)>&, boost::context::continuation> (__fn=...) at /usr/include/c++/11/functional:98
#26 boost::context::detail::record<boost::context::continuation, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits>, spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)> >::run (fctx=<optimized out>, this=<optimized out>)
    at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/redhat-linux-build/boost/include/boost/context/continuation_fcontext.hpp:160
#27 boost::context::detail::context_entry<boost::context::detail::record<boost::context::continuation, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits>, spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)> > >(boost::context::detail::transfer_t) (t=...)
    at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/redhat-linux-build/boost/include/boost/context/continuation_fcontext.hpp:97
#28 0x000055b1f5a2d52f in make_fcontext ()
#29 0x0000000000000000 in ?? ()

Actions #5

Updated by Yuval Lifshitz 2 months ago

Casey Bodley wrote:

@Yuval maybe it would make sense to split the rgw/notifications suite into two separate jobs for kafka and amqp, so we can hopefully get clean runs from amqp?

amqp also has some unexplained failures. i have this PR: https://github.com/ceph/ceph/pull/55666
to run the http and basic tests before kafka and amqp, so when we get the expected failures we know that the rest of the tests were passing

Actions #6

Updated by Yuval Lifshitz about 1 month ago

similar crash, but with "Attempt to free invalid pointer" in tcmalloc:

2024-03-13T12:46:17.456 INFO:tasks.rgw.client.0.smithi007.stdout:src/tcmalloc.cc:333] Attempt to free invalid pointer 0x564200000000
2024-03-13T12:46:17.457 INFO:tasks.rgw.client.0.smithi007.stdout:*** Caught signal (Aborted) **
2024-03-13T12:46:17.457 INFO:tasks.rgw.client.0.smithi007.stdout: in thread 7f9fb6d43640 thread_name:io_context_pool
2024-03-13T12:46:17.457 INFO:tasks.rgw.client.0.smithi007.stdout:Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed: previous_prio == -1 || (previous_prio >= fifo_min_prio && previous_prio <= fifo_max_prio)

Actions

Also available in: Atom PDF