Project

General

Profile

Bug #50743

*: crash in pthread_getname_np

Added by Yaarit Hatuka almost 3 years ago. Updated almost 2 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):

7a4dc07de98f9fe5951207ab4b4b599270f9729af0b338a4212d3bf9335cf310
8032fa5f1f2107af12b68e6fe2586cc5fe2c1ad99a09d4b1bbfa47a813852039
25c187a52a0bd6185eea9df828445b7bd639a28947da47ae5869697eb9e1ec89
280c1c6704dab0acc19e650eadfb3ccff4f9a44af5cd64b22521731c6c34b09c
ae2a86a8e0ea3bdd90d99e99346e48bfc31a9b6d07ce9185847944499a5c5e86
c557eb5113a5ee72a6fd4463538af6975ade7e57697b33b27f877f3595275368


Description

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8032fa5f1f2107af12b68e6fe2586cc5fe2c1ad99a09d4b1bbfa47a813852039

Sanitized backtrace:

    /lib64/libpthread.so.0(
    pthread_getname_np()
    ceph::logging::Log::dump_recent()
    MDSDaemon::respawn()
    Context::complete(int)
    MDSRank::respawn()
    MDSRank::handle_write_error(int)
    /usr/bin/ceph-mds(
    Context::complete(int)
    Finisher::finisher_thread_entry()
    /lib64/libpthread.so.0(
    clone()

Crash dump sample:
{
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12b20) [0x7fca48244b20]",
        "pthread_getname_np()",
        "(ceph::logging::Log::dump_recent()+0x4b3) [0x7fca4980c643]",
        "(MDSDaemon::respawn()+0x15b) [0x55ca473547fb]",
        "(Context::complete(int)+0xd) [0x55ca4735c9ed]",
        "(MDSRank::respawn()+0x1c) [0x55ca4736289c]",
        "(MDSRank::handle_write_error(int)+0x1a6) [0x55ca47366a26]",
        "/usr/bin/ceph-mds(+0x1bef20) [0x55ca47366f20]",
        "(Context::complete(int)+0xd) [0x55ca4735c9ed]",
        "(Finisher::finisher_thread_entry()+0x1a5) [0x7fca495380b5]",
        "/lib64/libpthread.so.0(+0x814a) [0x7fca4823a14a]",
        "clone()" 
    ],
    "ceph_version": "16.2.1",
    "crash_id": "2021-05-08T02:06:04.479049Z_c595a994-0782-4309-9f1b-e2f7dedaf821",
    "entity_name": "mds.20297e33b547c2f168d55f24c5d7328709e9b647",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mds",
    "stack_sig": "7a4dc07de98f9fe5951207ab4b4b599270f9729af0b338a4212d3bf9335cf310",
    "timestamp": "2021-05-08T02:06:04.479049Z",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-70-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021" 
}

History

#1 Updated by Yaarit Hatuka almost 3 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v16.2.1 added

#2 Updated by Neha Ojha almost 3 years ago

  • Project changed from RADOS to CephFS

#3 Updated by Patrick Donnelly almost 3 years ago

  • Subject changed from crash: /lib64/libpthread.so.0( to mds: crash in pthread_getname_np
  • Status changed from New to Need More Info

How do we know what the signal number was? Not clear to me what to do with this. I don't see anything obviously wrong with the call to pthread_getname_np in Log::dump_recent.

#4 Updated by Yaarit Hatuka almost 3 years ago

Hi Patrick,

We don't have the signal number yet in the telemetry crash reports.

You can see other crash events with the same signature here:
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8032fa5f1f2107af12b68e6fe2586cc5fe2c1ad99a09d4b1bbfa47a813852039

in the Crashes table:
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?viewPanel=26&orgId=1&var-sig_v2=8032fa5f1f2107af12b68e6fe2586cc5fe2c1ad99a09d4b1bbfa47a813852039

If it seems like this is not a bug after all, you can change the status to Rejected or Won't Fix.

#5 Updated by Patrick Donnelly almost 3 years ago

Yaarit Hatuka wrote:

Hi Patrick,

We don't have the signal number yet in the telemetry crash reports.

You can see other crash events with the same signature here:
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=8032fa5f1f2107af12b68e6fe2586cc5fe2c1ad99a09d4b1bbfa47a813852039

in the Crashes table:
http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?viewPanel=26&orgId=1&var-sig_v2=8032fa5f1f2107af12b68e6fe2586cc5fe2c1ad99a09d4b1bbfa47a813852039

If it seems like this is not a bug after all, you can change the status to Rejected or Won't Fix.

Sorry, how do you conclude it's not a bug after all?

#6 Updated by Yaarit Hatuka almost 3 years ago

oh, I mean in general, not necessarily in this case.

This was opened automatically by a telemetry-to-redmine bot that I'm working on, that will import the telemetry crash reports to redmine. It was opened with my user, but future issues will be opened with the telemetry bot user.

#7 Updated by Patrick Donnelly almost 3 years ago

  • Project changed from CephFS to Ceph
  • Subject changed from mds: crash in pthread_getname_np to *: crash in pthread_getname_np
  • Target version set to v17.0.0
  • Source changed from Telemetry to Q/A
  • Backport set to pacific
2021-06-01T05:31:28.917 INFO:tasks.ceph.mon.a.smithi053.stderr:*** Caught signal (Segmentation fault) **
2021-06-01T05:31:28.917 INFO:tasks.ceph.mon.a.smithi053.stderr: in thread 7f819c376700 thread_name:ceph-mon
2021-06-01T05:31:28.917 INFO:tasks.ceph.mon.a.smithi053.stderr: ceph version 16.2.4-225-gf9084200 (f908420004cc81a30edb2b252b4d92f50c526280) pacific (stable)
2021-06-01T05:31:28.917 INFO:tasks.ceph.mon.a.smithi053.stderr: 1: /lib64/libpthread.so.0(+0x12dc0) [0x7f8191010dc0]
2021-06-01T05:31:28.918 INFO:tasks.ceph.mon.a.smithi053.stderr: 2: pthread_getname_np()
2021-06-01T05:31:28.918 INFO:tasks.ceph.mon.a.smithi053.stderr: 3: (ceph::logging::Log::dump_recent()+0x4b3) [0x7f81938845a3]
2021-06-01T05:31:28.918 INFO:tasks.ceph.mon.a.smithi053.stderr: 4: ceph-mon(+0x53110b) [0x5594d213110b]
2021-06-01T05:31:28.919 INFO:tasks.ceph.mon.a.smithi053.stderr: 5: /lib64/libpthread.so.0(+0x12dc0) [0x7f8191010dc0]
2021-06-01T05:31:28.919 INFO:tasks.ceph.mon.a.smithi053.stderr: 6: gsignal()
2021-06-01T05:31:28.919 INFO:tasks.ceph.mon.a.smithi053.stderr: 7: abort()
2021-06-01T05:31:28.920 INFO:tasks.ceph.mon.a.smithi053.stderr: 8: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f819351059d]
2021-06-01T05:31:28.920 INFO:tasks.ceph.mon.a.smithi053.stderr: 9: /usr/lib64/ceph/libceph-common.so.2(+0x276766) [0x7f8193510766]
2021-06-01T05:31:28.920 INFO:tasks.ceph.mon.a.smithi053.stderr: 10: (Monitor::~Monitor()+0xb35) [0x5594d1ee0995]
2021-06-01T05:31:28.921 INFO:tasks.ceph.mon.a.smithi053.stderr: 11: (Monitor::~Monitor()+0xd) [0x5594d1ee09ed]
2021-06-01T05:31:28.921 INFO:tasks.ceph.mon.a.smithi053.stderr: 12: main()
2021-06-01T05:31:28.921 INFO:tasks.ceph.mon.a.smithi053.stderr: 13: __libc_start_main()
2021-06-01T05:31:28.922 INFO:tasks.ceph.mon.a.smithi053.stderr: 14: _start()
2021-06-01T05:31:28.980 INFO:tasks.ceph.mgr.z.smithi179.stderr:daemon-helper: command crashed with signal 15

From: /ceph/teuthology-archive/teuthology-2021-06-01_04:17:03-fs-pacific-distro-basic-smithi/6144511/teuthology.log

Test failed for other reasons but we finally have this showing up in QA.

#8 Updated by Neha Ojha almost 3 years ago

  • Project changed from Ceph to RADOS

#9 Updated by Telemetry Bot about 2 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)

#10 Updated by Telemetry Bot about 2 years ago

  • Crash signature (v1) updated (diff)
  • Affected Versions v15.2.10, v15.2.11, v15.2.12, v15.2.13, v15.2.15, v15.2.4, v15.2.5, v15.2.7, v15.2.8, v16.2.4, v16.2.5, v16.2.6, v16.2.7 added

#11 Updated by Radoslaw Zarzynski almost 2 years ago

  • Crash signature (v1) updated (diff)

Still Need More Info as the logs aren't there after the months.

Also available in: Atom PDF