Project

General

Profile

Actions

Bug #63314

open

kafka crashed during message callback in teuthology

Added by Yuval Lifshitz 6 months ago. Updated 3 months ago.

Status:
Pending Backport
Priority:
High
Target version:
-
% Done:

0%

Source:
Tags:
kafka, amqp backport_processed
Backport:
reef, quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

test: http://pulpito.front.sepia.ceph.com/yuvalif-2023-10-25_06:07:27-rgw:notifications-wip-topic-policy-distro-default-smithi/
backtrace:

Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed: previous_prio == -1 || (previous_prio >= fifo_min_prio && previous_prio <= fifo_max_prio)
*** Caught signal (Aborted) **
in thread 7fdf852cc640 thread_name:kafka_manager
ceph version 18.0.0-6835-ge5c0f496 (e5c0f4966b3da3fed8fcdae150335c900609e4d9) reef (dev)
1: /lib64/libc.so.6(+0x54db0) [0x7fe0b8054db0]
2: /lib64/libc.so.6(+0xa154c) [0x7fe0b80a154c]
3: raise()
4: abort()
5: /lib64/libc.so.6(+0x29130) [0x7fe0b8029130]
6: /lib64/libc.so.6(+0x4daf7) [0x7fe0b804daf7]
7: /lib64/libc.so.6(+0xa7d18) [0x7fe0b80a7d18]
8: (RGWPubSubKafkaEndpoint::Waiter::finish(int)+0xbe) [0x55dc941187ee]
9: (rgw::kafka::message_callback(rd_kafka_s*, rd_kafka_message_s const*, void*)+0xef) [0x55dc94245c8f]
10: /lib64/librdkafka.so.1(+0x256ef) [0x7fe0b90776ef]
11: /lib64/librdkafka.so.1(+0x5b862) [0x7fe0b90ad862]
12: rd_kafka_poll()
13: (rgw::kafka::Manager::run()+0x328) [0x55dc94248008]
14: /lib64/libstdc++.so.6(+0xdb924) [0x7fe0b84db924]
15: /lib64/libc.so.6(+0x9f802) [0x7fe0b809f802]
16: /lib64/libc.so.6(+0x3f450) [0x7fe0b803f450]


Related issues 3 (3 open0 closed)

Related to rgw - Bug #64184: test_bn.py -v -a kafka_test: Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failedNewYuval Lifshitz

Actions
Copied to rgw - Backport #63701: reef: kafka crashed during message callback in teuthologyIn ProgressYuval LifshitzActions
Copied to rgw - Backport #63702: quincy: kafka crashed during message callback in teuthologyIn ProgressYuval LifshitzActions
Actions #1

Updated by Yuval Lifshitz 6 months ago

found this issue in the librdkafka repo: https://github.com/confluentinc/librdkafka/issues/3688
that seems related

Actions #2

Updated by Casey Bodley 6 months ago

  • Status changed from New to Triaged
Actions #3

Updated by Yuval Lifshitz 6 months ago

will attempts to fix the issue by using librdkafka as a git submodule.
using a version that has the fix above.

Actions #4

Updated by Yuval Lifshitz 6 months ago

according to the log, centos9 has this version: librdkafka-1.6.1.
and according to the git log, this version has the fix to the crash:

commit 7915f2a9adf30ca5938945c607b35c9c5a5a63e3
Author: Magnus Edenhill <magnus@edenhill.se>
Date:   Mon Apr 6 12:52:25 2020 +0200

    Align bundled c11 threads (tinycthreads) constants to glibc and musl (#2681)


this means that there is no need for a newer version of the library via a submodule

Actions #5

Updated by Yuval Lifshitz 5 months ago

crash is happening in the dtor if the unique_lock defined here (when it goes out of scope): https://github.com/ceph/ceph/blob/main/src/rgw/driver/rados/rgw_pubsub_push.cc#L359

backtrace from gdb:

#0  0x00007fe48c4a154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007fe48c454d06 in raise () from /lib64/libc.so.6
#2  0x00007fe48c4287f3 in abort () from /lib64/libc.so.6
#3  0x00007fe48c429130 in __libc_message.cold () from /lib64/libc.so.6
#4  0x00007fe48c44daf7 in __libc_assert_fail () from /lib64/libc.so.6
#5  0x00007fe48c4a7d18 in __pthread_tpp_change_priority () from /lib64/libc.so.6
#6  0x000055be80a0ef0e in RGWPubSubKafkaEndpoint::Waiter::finish(int) () at /usr/include/c++/11/bits/unique_lock.h:103
#7  0x000055be80b37b2f in std::function<void (int)>::operator()(int) const (__args#0=0, this=0x55be87ea6c88) at /usr/include/c++/11/bits/std_function.h:590
#8  rgw::kafka::message_callback (rk=<optimized out>, rkmessage=<optimized out>, opaque=0x55be93d3c5a0) at /usr/src/debug/ceph-18.0.0-7517.gcdddd29f.el9.x86_64/src/rgw/rgw_kafka.cc:168
#9  0x00007fe48d2776ef in rd_kafka_poll_cb () from /lib64/librdkafka.so.1
#10 0x00007fe48d2ad862 in rd_kafka_q_serve.localalias () from /lib64/librdkafka.so.1
#11 0x00007fe48d2724b8 in rd_kafka_poll () from /lib64/librdkafka.so.1
#12 0x000055be80b39ea8 in rgw::kafka::Manager::run (this=0x55be84213a20) at /usr/src/debug/ceph-18.0.0-7517.gcdddd29f.el9.x86_64/src/rgw/rgw_kafka.cc:487
#13 0x00007fe48c8db924 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#14 0x00007fe48c49f802 in start_thread () from /lib64/libc.so.6
#15 0x00007fe48c43f450 in clone3 () from /lib64/libc.so.6
Actions #6

Updated by Yuval Lifshitz 5 months ago

  • Status changed from Triaged to Fix Under Review
  • Pull request ID set to 54697
Actions #7

Updated by Yuval Lifshitz 5 months ago

  • Tags changed from kafka to kafka, amqp
  • Backport set to reef, quincy
Actions #8

Updated by Yuval Lifshitz 5 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #9

Updated by Backport Bot 5 months ago

  • Copied to Backport #63701: reef: kafka crashed during message callback in teuthology added
Actions #10

Updated by Backport Bot 5 months ago

  • Copied to Backport #63702: quincy: kafka crashed during message callback in teuthology added
Actions #11

Updated by Backport Bot 5 months ago

  • Tags changed from kafka, amqp to kafka, amqp backport_processed
Actions #12

Updated by Casey Bodley 3 months ago

@Yuval, this has still been crashing on main: http://qa-proxy.ceph.com/teuthology/suriarte-2024-01-22_16:24:20-rgw-rgw_asio_frontend_asserts-distro-default-smithi/7527259/teuthology.log

2024-01-24T10:35:50.574 DEBUG:teuthology.orchestra.run.smithi045:bucket notification tests against different endpoints> BNTESTS_CONF=/home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/bn-tests.client.0.conf /home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/virtualenv/bin/python -m nose -s /home/ubuntu/cephtest/ceph/src/test/rgw/bucket_notification/test_bn.py -v -a kafka_test
2024-01-24T10:36:01.500 INFO:tasks.rgw.client.0.smithi045.stdout:Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed: previous_prio == -1 || (previous_prio >= fifo_min_prio && previous_prio <= fifo_max_prio)
2024-01-24T10:36:01.501 INFO:tasks.rgw.client.0.smithi045.stdout:*** Caught signal (Aborted) **
2024-01-24T10:36:01.501 INFO:tasks.rgw.client.0.smithi045.stdout: in thread 7f087c57e640 thread_name:kafka_manager
2024-01-24T10:36:01.501 INFO:tasks.rgw.client.0.smithi045.stdout: ceph version 19.0.0-717-gee50056b (ee50056b4421500d187e28edc49461ee9569002e) squid (dev)
2024-01-24T10:36:01.501 INFO:tasks.rgw.client.0.smithi045.stdout: 1: /lib64/libc.so.6(+0x54db0) [0x7f09abc54db0]
2024-01-24T10:36:01.502 INFO:tasks.rgw.client.0.smithi045.stdout: 2: /lib64/libc.so.6(+0xa154c) [0x7f09abca154c]
2024-01-24T10:36:01.502 INFO:tasks.rgw.client.0.smithi045.stdout: 3: raise()
2024-01-24T10:36:01.502 INFO:tasks.rgw.client.0.smithi045.stdout: 4: abort()
2024-01-24T10:36:01.502 INFO:tasks.rgw.client.0.smithi045.stdout: 5: /lib64/libc.so.6(+0x29130) [0x7f09abc29130]
2024-01-24T10:36:01.502 INFO:tasks.rgw.client.0.smithi045.stdout: 6: /lib64/libc.so.6(+0x4daf7) [0x7f09abc4daf7]
2024-01-24T10:36:01.502 INFO:tasks.rgw.client.0.smithi045.stdout: 7: /lib64/libc.so.6(+0xa7d18) [0x7f09abca7d18]
2024-01-24T10:36:01.502 INFO:tasks.rgw.client.0.smithi045.stdout: 8: (std::_Function_handler<void (int), RGWPubSubKafkaEndpoint::send_to_completion_async(ceph::common::CephContext*, rgw_pubsub_s3_event const&, optional_yield)::{lambda(int)#1}>::_M_invoke(std::_Any_data const&, int&&)+0x95) [0x564f77e7fa75]
2024-01-24T10:36:01.502 INFO:tasks.rgw.client.0.smithi045.stdout: 9: (rgw::kafka::message_callback(rd_kafka_s*, rd_kafka_message_s const*, void*)+0xef) [0x564f77fabaaf]
2024-01-24T10:36:01.502 INFO:tasks.rgw.client.0.smithi045.stdout: 10: /lib64/librdkafka.so.1(+0x256ef) [0x7f09aca776ef]
2024-01-24T10:36:01.502 INFO:tasks.rgw.client.0.smithi045.stdout: 11: /lib64/librdkafka.so.1(+0x5b862) [0x7f09acaad862]
2024-01-24T10:36:01.502 INFO:tasks.rgw.client.0.smithi045.stdout: 12: rd_kafka_poll()
2024-01-24T10:36:01.503 INFO:tasks.rgw.client.0.smithi045.stdout: 13: (rgw::kafka::Manager::run()+0x350) [0x564f77fade50]
2024-01-24T10:36:01.503 INFO:tasks.rgw.client.0.smithi045.stdout: 14: /lib64/libstdc++.so.6(+0xdb924) [0x7f09ac0db924]
2024-01-24T10:36:01.503 INFO:tasks.rgw.client.0.smithi045.stdout: 15: /lib64/libc.so.6(+0x9f802) [0x7f09abc9f802]
2024-01-24T10:36:01.503 INFO:tasks.rgw.client.0.smithi045.stdout: 16: /lib64/libc.so.6(+0x3f450) [0x7f09abc3f450]
Actions #13

Updated by Casey Bodley 3 months ago

  • Related to Bug #64184: test_bn.py -v -a kafka_test: Fatal glibc error: tpp.c:87 (__pthread_tpp_change_priority): assertion failed added
Actions

Also available in: Atom PDF