Project

General

Profile

Bug #50743

*: crash in pthread_getname_np

Added by Yaarit Hatuka 5 months ago. Updated 4 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):

7a4dc07de98f9fe5951207ab4b4b599270f9729af0b338a4212d3bf9335cf310

Crash signature (v2):

8032fa5f1f2107af12b68e6fe2586cc5fe2c1ad99a09d4b1bbfa47a813852039


Description

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8032fa5f1f2107af12b68e6fe2586cc5fe2c1ad99a09d4b1bbfa47a813852039

Sanitized backtrace:

    /lib64/libpthread.so.0(
    pthread_getname_np()
    ceph::logging::Log::dump_recent()
    MDSDaemon::respawn()
    Context::complete(int)
    MDSRank::respawn()
    MDSRank::handle_write_error(int)
    /usr/bin/ceph-mds(
    Context::complete(int)
    Finisher::finisher_thread_entry()
    /lib64/libpthread.so.0(
    clone()

Crash dump sample:
{
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12b20) [0x7fca48244b20]",
        "pthread_getname_np()",
        "(ceph::logging::Log::dump_recent()+0x4b3) [0x7fca4980c643]",
        "(MDSDaemon::respawn()+0x15b) [0x55ca473547fb]",
        "(Context::complete(int)+0xd) [0x55ca4735c9ed]",
        "(MDSRank::respawn()+0x1c) [0x55ca4736289c]",
        "(MDSRank::handle_write_error(int)+0x1a6) [0x55ca47366a26]",
        "/usr/bin/ceph-mds(+0x1bef20) [0x55ca47366f20]",
        "(Context::complete(int)+0xd) [0x55ca4735c9ed]",
        "(Finisher::finisher_thread_entry()+0x1a5) [0x7fca495380b5]",
        "/lib64/libpthread.so.0(+0x814a) [0x7fca4823a14a]",
        "clone()" 
    ],
    "ceph_version": "16.2.1",
    "crash_id": "2021-05-08T02:06:04.479049Z_c595a994-0782-4309-9f1b-e2f7dedaf821",
    "entity_name": "mds.20297e33b547c2f168d55f24c5d7328709e9b647",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mds",
    "stack_sig": "7a4dc07de98f9fe5951207ab4b4b599270f9729af0b338a4212d3bf9335cf310",
    "timestamp": "2021-05-08T02:06:04.479049Z",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-70-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021" 
}

History

#1 Updated by Yaarit Hatuka 5 months ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v16.2.1 added

#2 Updated by Neha Ojha 4 months ago

  • Project changed from RADOS to CephFS

#3 Updated by Patrick Donnelly 4 months ago

  • Subject changed from crash: /lib64/libpthread.so.0( to mds: crash in pthread_getname_np
  • Status changed from New to Need More Info

How do we know what the signal number was? Not clear to me what to do with this. I don't see anything obviously wrong with the call to pthread_getname_np in Log::dump_recent.

#4 Updated by Yaarit Hatuka 4 months ago

Hi Patrick,

We don't have the signal number yet in the telemetry crash reports.

You can see other crash events with the same signature here:
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8032fa5f1f2107af12b68e6fe2586cc5fe2c1ad99a09d4b1bbfa47a813852039

in the Crashes table:
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?viewPanel=26&orgId=1&var-sig_v2=8032fa5f1f2107af12b68e6fe2586cc5fe2c1ad99a09d4b1bbfa47a813852039

If it seems like this is not a bug after all, you can change the status to Rejected or Won't Fix.

#5 Updated by Patrick Donnelly 4 months ago

Yaarit Hatuka wrote:

Hi Patrick,

We don't have the signal number yet in the telemetry crash reports.

You can see other crash events with the same signature here:
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8032fa5f1f2107af12b68e6fe2586cc5fe2c1ad99a09d4b1bbfa47a813852039

in the Crashes table:
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?viewPanel=26&orgId=1&var-sig_v2=8032fa5f1f2107af12b68e6fe2586cc5fe2c1ad99a09d4b1bbfa47a813852039

If it seems like this is not a bug after all, you can change the status to Rejected or Won't Fix.

Sorry, how do you conclude it's not a bug after all?

#6 Updated by Yaarit Hatuka 4 months ago

oh, I mean in general, not necessarily in this case.

This was opened automatically by a telemetry-to-redmine bot that I'm working on, that will import the telemetry crash reports to redmine. It was opened with my user, but future issues will be opened with the telemetry bot user.

#7 Updated by Patrick Donnelly 4 months ago

  • Project changed from CephFS to Ceph
  • Subject changed from mds: crash in pthread_getname_np to *: crash in pthread_getname_np
  • Target version set to v17.0.0
  • Source changed from Telemetry to Q/A
  • Backport set to pacific
2021-06-01T05:31:28.917 INFO:tasks.ceph.mon.a.smithi053.stderr:*** Caught signal (Segmentation fault) **
2021-06-01T05:31:28.917 INFO:tasks.ceph.mon.a.smithi053.stderr: in thread 7f819c376700 thread_name:ceph-mon
2021-06-01T05:31:28.917 INFO:tasks.ceph.mon.a.smithi053.stderr: ceph version 16.2.4-225-gf9084200 (f908420004cc81a30edb2b252b4d92f50c526280) pacific (stable)
2021-06-01T05:31:28.917 INFO:tasks.ceph.mon.a.smithi053.stderr: 1: /lib64/libpthread.so.0(+0x12dc0) [0x7f8191010dc0]
2021-06-01T05:31:28.918 INFO:tasks.ceph.mon.a.smithi053.stderr: 2: pthread_getname_np()
2021-06-01T05:31:28.918 INFO:tasks.ceph.mon.a.smithi053.stderr: 3: (ceph::logging::Log::dump_recent()+0x4b3) [0x7f81938845a3]
2021-06-01T05:31:28.918 INFO:tasks.ceph.mon.a.smithi053.stderr: 4: ceph-mon(+0x53110b) [0x5594d213110b]
2021-06-01T05:31:28.919 INFO:tasks.ceph.mon.a.smithi053.stderr: 5: /lib64/libpthread.so.0(+0x12dc0) [0x7f8191010dc0]
2021-06-01T05:31:28.919 INFO:tasks.ceph.mon.a.smithi053.stderr: 6: gsignal()
2021-06-01T05:31:28.919 INFO:tasks.ceph.mon.a.smithi053.stderr: 7: abort()
2021-06-01T05:31:28.920 INFO:tasks.ceph.mon.a.smithi053.stderr: 8: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f819351059d]
2021-06-01T05:31:28.920 INFO:tasks.ceph.mon.a.smithi053.stderr: 9: /usr/lib64/ceph/libceph-common.so.2(+0x276766) [0x7f8193510766]
2021-06-01T05:31:28.920 INFO:tasks.ceph.mon.a.smithi053.stderr: 10: (Monitor::~Monitor()+0xb35) [0x5594d1ee0995]
2021-06-01T05:31:28.921 INFO:tasks.ceph.mon.a.smithi053.stderr: 11: (Monitor::~Monitor()+0xd) [0x5594d1ee09ed]
2021-06-01T05:31:28.921 INFO:tasks.ceph.mon.a.smithi053.stderr: 12: main()
2021-06-01T05:31:28.921 INFO:tasks.ceph.mon.a.smithi053.stderr: 13: __libc_start_main()
2021-06-01T05:31:28.922 INFO:tasks.ceph.mon.a.smithi053.stderr: 14: _start()
2021-06-01T05:31:28.980 INFO:tasks.ceph.mgr.z.smithi179.stderr:daemon-helper: command crashed with signal 15

From: /ceph/teuthology-archive/teuthology-2021-06-01_04:17:03-fs-pacific-distro-basic-smithi/6144511/teuthology.log

Test failed for other reasons but we finally have this showing up in QA.

#8 Updated by Neha Ojha 4 months ago

  • Project changed from Ceph to RADOS

Also available in: Atom PDF