Actions
Bug #64184
opentest_bn.py -v -a kafka_test: Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed
% Done:
0%
Source:
Tags:
notifications kafka
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2024-01-25T03:01:00.979 INFO:tasks.notification_tests:Running bucket-notifications-tests... 2024-01-25T03:01:00.979 DEBUG:teuthology.orchestra.run.smithi046:bucket notification tests against different endpoints> BNTESTS_CONF=/home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/bn-tests.client.0.conf /home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/virtualenv/bin/python -m nose -s /home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/test_bn.py -v -a kafka_test 2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout:Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed: previous_prio == -1 || (previous_prio >= fifo_min_prio && previous_prio <= fifo_max_prio) 2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout:*** Caught signal (Aborted) ** 2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout: in thread 7f69ab974640 thread_name:kafka_manager 2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout: ceph version 19.0.0-814-g1a8bb77b (1a8bb77be00267ce596e60f3b1141a4463aab767) squid (dev) 2024-01-25T03:01:11.896 INFO:tasks.rgw.client.0.smithi046.stdout: 1: /lib64/libc.so.6(+0x54db0) [0x7f6add654db0] 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 2: /lib64/libc.so.6(+0xa154c) [0x7f6add6a154c] 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 3: raise() 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 4: abort() 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 5: /lib64/libc.so.6(+0x29130) [0x7f6add629130] 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 6: /lib64/libc.so.6(+0x4daf7) [0x7f6add64daf7] 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 7: /lib64/libc.so.6(+0xa7d18) [0x7f6add6a7d18] 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 8: (std::_Function_handler<void (int), RGWPubSubKafkaEndpoint::send_to_completion_async(ceph::common::CephContext*, rgw_pubsub_s3_event const&, optional_yield)::{lambda(int)#1}>::_M_invoke(std::_Any_data const&, int&&)+0x95) [0x558c16fa84e5] 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 9: (rgw::kafka::message_callback(rd_kafka_s*, rd_kafka_message_s const*, void*)+0xef) [0x558c170cc1ff] 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 10: /lib64/librdkafka.so.1(+0x256ef) [0x7f6ade4776ef] 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 11: /lib64/librdkafka.so.1(+0x5b862) [0x7f6ade4ad862] 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 12: rd_kafka_poll() 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 13: (rgw::kafka::Manager::run()+0x350) [0x558c170ce5a0] 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 14: /lib64/libstdc++.so.6(+0xdb924) [0x7f6addadb924] 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 15: /lib64/libc.so.6(+0x9f802) [0x7f6add69f802] 2024-01-25T03:01:11.897 INFO:tasks.rgw.client.0.smithi046.stdout: 16: /lib64/libc.so.6(+0x3f450) [0x7f6add63f450]
Updated by Casey Bodley 3 months ago
- Related to Bug #63314: kafka crashed during message callback in teuthology added
Updated by Casey Bodley 3 months ago
Updated by Casey Bodley 2 months ago
@Yuval maybe it would make sense to split the rgw/notifications suite into two separate jobs for kafka and amqp, so we can hopefully get clean runs from amqp?
Updated by Yuval Lifshitz 2 months ago
another crash trace from kafka test:
#0 0x00007f45788a154c in __pthread_kill_implementation () from /lib64/libc.so.6 #1 0x00007f4578854d06 in raise () from /lib64/libc.so.6 #2 0x00007f45788287f3 in abort () from /lib64/libc.so.6 #3 0x00007f457ba161f4 in tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem) [clone .cold] () from /lib64/libtcmalloc.so.4 #4 0x00007f457ba1a7e3 in (anonymous namespace)::InvalidFree(void*) () from /lib64/libtcmalloc.so.4 #5 0x000055b1f503a2d2 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x55b1f80cdf10) at /usr/include/c++/11/bits/shared_ptr_base.h:168 #6 0x000055b1f53ef08b in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:705 #7 std::__shared_ptr<std::__detail::_NFA<std::__cxx11::regex_traits<char> > const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:1154 #8 std::shared_ptr<std::__detail::_NFA<std::__cxx11::regex_traits<char> > const>::~shared_ptr (this=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/shared_ptr.h:122 #9 std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::~basic_regex (this=<optimized out>, this=<optimized out>) at /usr/include/c++/11/bits/regex.h:535 #10 rgw::parse_url_authority (url=..., host="localhost", user="", password="") at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_url.cc:33 #11 0x000055b1f553e8d9 in rgw::kafka::Manager::connect (this=0x55b1f71623c0, broker="localhost", url=..., use_ssl=<optimized out>, verify_ssl=<optimized out>, ca_location=..., mechanism=...) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_kafka.cc:558 #12 0x000055b1f541b763 in rgw::kafka::connect (mechanism=..., ca_location=..., verify_ssl=true, use_ssl=<optimized out>, url="kafka://localhost", broker="localhost") at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_kafka.cc:692 #13 RGWPubSubKafkaEndpoint::RGWPubSubKafkaEndpoint (_cct=<optimized out>, args=..., _topic=..., _endpoint="kafka://localhost", this=0x55b203d51c80) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/driver/rados/rgw_pubsub_push.cc:306 #14 RGWPubSubEndpoint::create (endpoint="kafka://localhost", topic=..., args=..., cct=<optimized out>) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/driver/rados/rgw_pubsub_push.cc:394 #15 0x000055b1f540c0dc in rgw::notify::publish_commit (obj=0x55b1fe586f00, size=1476742, mtime=..., etag="6ba994fd40c8086e7ecff9fc89b29dfb", version="", event_type=rgw::notify::ObjectRemovedDelete, res=..., dpp=<optimized out>) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/driver/rados/rgw_notify.cc:1129 #16 0x000055b1f54b85ef in rgw::sal::RadosNotification::publish_commit (this=this@entry=0x55b204a14360, dpp=dpp@entry=0x55b200d52480, size=size@entry=1476742, mtime=..., etag="6ba994fd40c8086e7ecff9fc89b29dfb", version="") at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/driver/rados/rgw_sal_rados.cc:2850 #17 0x000055b1f529250f in RGWDeleteObj::execute (this=<optimized out>, y=...) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_op.cc:5398 #18 0x000055b1f512bca2 in rgw_process_authenticated (handler=<optimized out>, op=@0x7f4445fb93e0: 0x55b200d52480, req=<optimized out>, s=<optimized out>, y=..., driver=0x55b1f7f91c40, skip_retarget=false) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_process.cc:255 #19 0x000055b1f512ec4d in process_request (penv=..., req=0x7f4445fba4a0, frontend_prefix=..., client_io=0x7f4445fba550, yield=..., scheduler=0x55b1f82a2458, user=0x7f4445fba870, latency=0x7f4445fba478, http_ret=0x7f4445fba474) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_process.cc:389 #20 0x000055b1f59b9280 in (anonymous namespace)::handle_connection<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor> >(boost::asio::io_context&, RGWProcessEnv&, boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor>&, rgw::basic_timeout_timer<ceph::coarse_mono_clock, boost::asio::any_io_executor, (anonymous namespace)::Connection>&, unsigned long, boost::beast::flat_static_buffer<65536ul>&, bool, ceph::async::SharedMutex<boost::asio::any_io_executor>&, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::system::error_code&, spawn::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::any_io_executor> >) [clone .constprop.0] (context=..., env=..., stream=..., timeout=..., header_limit=16384, buffer=..., pause_mutex=..., scheduler=0x55b1f82a2458, uri_prefix="", ec=..., yield=..., is_ssl=false) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_asio_frontend.cc:290 #21 0x000055b1f5091fc4 in operator() (yield=..., __closure=0x55b20394c458) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/rgw/rgw_asio_frontend.cc:1061 #22 operator() (c=..., __closure=<optimized out>) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/src/spawn/include/spawn/impl/spawn.hpp:390 #23 std::__invoke_impl<boost::context::continuation, spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)>&, boost::context::continuation> (__f=...) at /usr/include/c++/11/bits/invoke.h:61 #24 std::__invoke<spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)>&, boost::context::continuation> (__fn=...) at /usr/include/c++/11/bits/invoke.h:97 #25 std::invoke<spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)>&, boost::context::continuation> (__fn=...) at /usr/include/c++/11/functional:98 #26 boost::context::detail::record<boost::context::continuation, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits>, spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)> >::run (fctx=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/redhat-linux-build/boost/include/boost/context/continuation_fcontext.hpp:160 #27 boost::context::detail::context_entry<boost::context::detail::record<boost::context::continuation, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits>, spawn::detail::spawn_helper<boost::asio::executor_binder<void (*)(), boost::asio::strand<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0> > >, (anonymous namespace)::AsioFrontend::accept((anonymous namespace)::AsioFrontend::Listener&, boost::system::error_code)::<lambda(spawn::yield_context)>, boost::context::basic_protected_fixedsize_stack<boost::context::stack_traits> >::operator()()::<lambda(boost::context::continuation&&)> > >(boost::context::detail::transfer_t) (t=...) at /usr/src/debug/ceph-19.0.0-1576.gc66cb5ee.el9.x86_64/redhat-linux-build/boost/include/boost/context/continuation_fcontext.hpp:97 #28 0x000055b1f5a2d52f in make_fcontext () #29 0x0000000000000000 in ?? ()
Updated by Yuval Lifshitz 2 months ago
Casey Bodley wrote:
@Yuval maybe it would make sense to split the rgw/notifications suite into two separate jobs for kafka and amqp, so we can hopefully get clean runs from amqp?
amqp also has some unexplained failures. i have this PR: https://github.com/ceph/ceph/pull/55666
to run the http and basic tests before kafka and amqp, so when we get the expected failures we know that the rest of the tests were passing
Updated by Yuval Lifshitz about 1 month ago
similar crash, but with "Attempt to free invalid pointer" in tcmalloc:
2024-03-13T12:46:17.456 INFO:tasks.rgw.client.0.smithi007.stdout:src/tcmalloc.cc:333] Attempt to free invalid pointer 0x564200000000 2024-03-13T12:46:17.457 INFO:tasks.rgw.client.0.smithi007.stdout:*** Caught signal (Aborted) ** 2024-03-13T12:46:17.457 INFO:tasks.rgw.client.0.smithi007.stdout: in thread 7f9fb6d43640 thread_name:io_context_pool 2024-03-13T12:46:17.457 INFO:tasks.rgw.client.0.smithi007.stdout:Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed: previous_prio == -1 || (previous_prio >= fifo_min_prio && previous_prio <= fifo_max_prio)
Actions