Project

General

Profile

Actions

Bug #52189

open

crash in AsyncConnection::maybe_start_delay_thread()

Added by Telemetry Bot over 2 years ago. Updated about 2 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Telemetry
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):

c537c3a739b3832f3cf3d7a68e9f86f7a427da20f94aae35e88365599c707f2b
f416301151f8db40b0181db80a708201072f5cb6c351e5aabf7146a0680160cc
3cda30dbfdc44a946a86df2b0704721d0bd534db9c52bf51f2123d787fc2c293
8e6a3a727d0e2a1ca4b31d02ddb3527a3ec3ac281bd786bfd4e08e3a9cb64303


Description

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=f416301151f8db40b0181db80a708201072f5cb6c351e5aabf7146a0680160cc

Sanitized backtrace:

    /lib64/libpthread.so.0(
    /lib64/libstdc
    /lib64/libstdc
    /lib64/libstdc
    /lib64/libstdc
    AsyncConnection::maybe_start_delay_thread()
    ProtocolV2::ready()
    ProtocolV2::handle_server_ident(ceph::buffer::v15_2_0::list&)
    ProtocolV2::handle_frame_payload()
    ProtocolV2::handle_read_frame_dispatch()
    ProtocolV2::_handle_read_frame_epilogue_main()
    ProtocolV2::_handle_read_frame_segment()
    ProtocolV2::handle_read_frame_segment(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)
    ProtocolV2::run_continuation(Ct<ProtocolV2>&)
    AsyncConnection::process()
    EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)
    /usr/lib64/ceph/libceph-common.so.2(
    /lib64/libstdc
    /lib64/libpthread.so.0(
    clone()

Crash dump sample:
{
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12b20) [0x7fee74e91b20]",
        "gsignal()",
        "abort()",
        "/lib64/libstdc++.so.6(+0x9009b) [0x7fee744aa09b]",
        "/lib64/libstdc++.so.6(+0x9653c) [0x7fee744b053c]",
        "/lib64/libstdc++.so.6(+0x96597) [0x7fee744b0597]",
        "/lib64/libstdc++.so.6(+0x967f8) [0x7fee744b07f8]",
        "(AsyncConnection::maybe_start_delay_thread()+0x135) [0x7fee776723b5]",
        "(ProtocolV2::ready()+0xc4) [0x7fee776af724]",
        "(ProtocolV2::handle_server_ident(ceph::buffer::v15_2_0::list&)+0xa14) [0x7fee776b8684]",
        "(ProtocolV2::handle_frame_payload()+0x21b) [0x7fee776c2d0b]",
        "(ProtocolV2::handle_read_frame_dispatch()+0x160) [0x7fee776c2f80]",
        "(ProtocolV2::_handle_read_frame_epilogue_main()+0x95) [0x7fee776c3175]",
        "(ProtocolV2::_handle_read_frame_segment()+0x92) [0x7fee776c3222]",
        "(ProtocolV2::handle_read_frame_segment(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)+0x201) [0x7fee776c4361]",
        "(ProtocolV2::run_continuation(Ct<ProtocolV2>&)+0x3c) [0x7fee776ac5ac]",
        "(AsyncConnection::process()+0x789) [0x7fee77674ac9]",
        "(EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xcb7) [0x7fee776cee37]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x5b434c) [0x7fee776d534c]",
        "/lib64/libstdc++.so.6(+0xc2ba3) [0x7fee744dcba3]",
        "/lib64/libpthread.so.0(+0x814a) [0x7fee74e8714a]",
        "clone()" 
    ],
    "ceph_version": "16.2.5",
    "crash_id": "2021-08-05T10:24:24.617917Z_69dfe6f7-9ca5-4d32-9e36-139112fde8fb",
    "entity_name": "mon.0fd3d58b4fa66013dde52112983e0e0fe9dd6e45",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mon",
    "stack_sig": "c537c3a739b3832f3cf3d7a68e9f86f7a427da20f94aae35e88365599c707f2b",
    "timestamp": "2021-08-05T10:24:24.617917Z",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-80-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021" 
}


Files

Actions #1

Updated by Telemetry Bot over 2 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v16.2.0, v16.2.1, v16.2.4, v16.2.5 added
Actions #2

Updated by Neha Ojha over 2 years ago

  • Subject changed from crash: /lib64/libpthread.so.0( to crash in AsyncConnection::maybe_start_delay_thread()
  • Status changed from New to Need More Info
  • Backport set to pacific

We'll need more information to debug a crash like this.

Actions #3

Updated by Christian Rohmann over 2 years ago

Neha Ojha wrote:

We'll need more information to debug a crash like this.

@Nea, we observed another one of these, see crash info:

{
    "backtrace": [
        "(()+0x12980) [0x7f1e545be980]",
        "(AsyncConnection::_stop()+0x9c) [0x559e8b8f4dfc]",
        "(ProtocolV2::stop()+0x8b) [0x559e8b91e28b]",
        "(ProtocolV2::handle_existing_connection(boost::intrusive_ptr<AsyncConnection> const&)+0x49e) [0x559e8b93306e]",
        "(ProtocolV2::handle_client_ident(ceph::buffer::v15_2_0::list&)+0xdc5) [0x559e8b934965]",
        "(ProtocolV2::handle_frame_payload()+0x12b) [0x559e8b934f1b]",
        "(ProtocolV2::handle_read_frame_dispatch()+0x150) [0x559e8b935260]",
        "(ProtocolV2::_handle_read_frame_epilogue_main()+0x69) [0x559e8b9353c9]",
        "(ProtocolV2::_handle_read_frame_segment()+0x121) [0x559e8b935781]",
        "(ProtocolV2::handle_read_frame_segment(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)+0xd7) [0x559e8b936967]",
        "(ProtocolV2::run_continuation(Ct<ProtocolV2>&)+0x34) [0x559e8b91ef84]",
        "(AsyncConnection::process()+0x5fc) [0x559e8b8f7f2c]",
        "(EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x7dd) [0x559e8b74818d]",
        "(()+0x11d4d08) [0x559e8b74dd08]",
        "(()+0xbd6df) [0x7f1e53c8e6df]",
        "(()+0x76db) [0x7f1e545b36db]",
        "(clone()+0x3f) [0x7f1e5334b71f]" 
    ],
    "ceph_version": "15.2.15",
    "crash_id": "2021-12-01T12:04:20.192379Z_0c456565-fece-4179-a466-b3441319f763",
    "entity_name": "osd.29",
    "os_id": "ubuntu",
    "os_name": "Ubuntu",
    "os_version": "18.04.6 LTS (Bionic Beaver)",
    "os_version_id": "18.04",
    "process_name": "ceph-osd",
    "stack_sig": "7e7de2d589b31653ad17cf6e70fe108ec1de0f0122d16f5b47e3b1885574e4da",
    "timestamp": "2021-12-01T12:04:20.192379Z",
    "utsname_hostname": "REDACTED",
    "utsname_machine": "x86_64",
    "utsname_release": "4.15.0-159-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#167-Ubuntu SMP Tue Sep 21 08:55:05 UTC 2021" 
}

Actions #4

Updated by Christian Rohmann over 2 years ago

We observed a few more of those crashes. Six of them where just seconds or minutes apart or different osd / hosts even.

Actions #5

Updated by Telemetry Bot about 2 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
Actions #6

Updated by Telemetry Bot about 2 years ago

  • Crash signature (v1) updated (diff)
  • Affected Versions v15.2.5, v15.2.6, v15.2.8, v16.2.6, v16.2.7 added
Actions

Also available in: Atom PDF