Project

General

Profile

Actions

Bug #62992

open

Heartbeat crash in reset_timeout and clear_timeout

Added by Laura Flores 7 months ago. Updated 9 days ago.

Status:
Pending Backport
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/lflores-2023-09-08_18:08:19-rados-wip-lflores-testing-2023-09-08-1504-reef-distro-default-smithi/7391228

{
    "crash_id": "2023-09-08T19:46:54.998286Z_6b0fe488-8ceb-4ba3-bae3-92b5ba65667c",
    "timestamp": "2023-09-08T19:46:54.998286Z",
    "process_name": "memcheck-amd64-",
    "entity_name": "osd.7",
    "ceph_version": "18.2.0-400-g3a47b8b8",
    "utsname_hostname": "smithi154",
    "utsname_sysname": "Linux",
    "utsname_release": "5.14.0-363.el9.x86_64",
    "utsname_version": "#1 SMP PREEMPT_DYNAMIC Tue Sep 5 18:30:19 UTC 2023",
    "utsname_machine": "x86_64",
    "os_name": "CentOS Stream",
    "os_id": "centos",
    "os_version_id": "9",
    "os_version": "9",
    "backtrace": [
        "/lib64/libc.so.6(+0x54db0) [0x549fdb0]",
        "/lib64/libc.so.6(+0xa154c) [0x54ec54c]",
        "(ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point<ceph::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >)+0x226) [0xb9ac76]",
        "(ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >)+0x70) [0xb9ad70]",
        "(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x248) [0xbaace8]",
        "ceph-osd(+0xaa3264) [0xbab264]",
        "/lib64/libc.so.6(+0x9f802) [0x54ea802]",
        "clone()" 
    ]
}

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?var-sig_v2=e037b11ac9f7a2e20c20fcf65d4439bf3fe294cbd5fae43cf74d9e92710d56e6&orgId=1


Related issues 2 (2 open0 closed)

Related to RADOS - Bug #64637: LeakPossiblyLost in BlueStore::_do_write_small() in osdNew

Actions
Copied to RADOS - Backport #63559: reef: Heartbeat crash in osdIn ProgressMatan BreizmanActions
Actions #2

Updated by Laura Flores 7 months ago

/a/lflores-2023-09-08_18:08:19-rados-wip-lflores-testing-2023-09-08-1504-reef-distro-default-smithi/7391156

Actions #3

Updated by Laura Flores 7 months ago

/a/lflores-2023-09-08_18:08:19-rados-wip-lflores-testing-2023-09-08-1504-reef-distro-default-smithi/7391363

Actions #4

Updated by Radoslaw Zarzynski 7 months ago

A starvation on env?

Actions #5

Updated by Aishwarya Mathuria 7 months ago

/a/yuriw-2023-10-06_22:29:11-rados-wip-yuri7-testing-2023-10-04-1350-reef-distro-default-smithi/7415952/

Actions #6

Updated by Aishwarya Mathuria 7 months ago

/a/yuriw-2023-10-16_21:58:41-rados-wip-yuri-testing-2023-10-16-1247-reef-distro-default-smithi/7430399/

Actions #7

Updated by Matan Breizman 7 months ago

  • Backport set to reef

/a/yuriw-2023-10-11_14:08:36-rados-wip-yuri11-testing-2023-10-10-1226-reef-distro-default-smithi/7421647/
/a/yuriw-2023-10-11_14:08:36-rados-wip-yuri11-testing-2023-10-10-1226-reef-distro-default-smithi/7421575/

Actions #8

Updated by Matan Breizman 7 months ago

All the mentioned jobs are Reef with `validater/valgrind`.

2023-10-11T21:48:28.620+0000 2a697640  1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x2a697640' had timed out after 15.000000954s
2023-10-11T21:48:28.632+0000 29e96640  1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x29e96640' had timed out after 15.000000954s
2023-10-11T21:48:28.643+0000 2b699640  1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x2b699640' had timed out after 15.000000954s
2023-10-11T21:48:28.648+0000 9c7b640  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x29695640' had timed out after 15.000000954s
2023-10-11T21:48:28.650+0000 9c7b640  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x29695640' had timed out after 15.000000954s
2023-10-11T21:48:28.650+0000 9c7b640  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x29695640' had timed out after 15.000000954s
2023-10-11T21:48:28.650+0000 9c7b640  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x29695640' had timed out after 15.000000954s
2023-10-11T21:48:28.651+0000 9c7b640  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x29695640' had timed out after 15.000000954s
2023-10-11T21:48:28.651+0000 9c7b640  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x29695640' had timed out after 15.000000954s
2023-10-11T21:48:28.652+0000 9c7b640  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x29695640' had timed out after 15.000000954s
2023-10-11T21:48:28.666+0000 29695640  1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x29695640' had timed out after 15.000000954s

Should we increase osd_op_thread_timeout and osd_op_thread_suicide_timeout with valgrind runs?

See also:

2023-10-11T20:59:25.995+0000 1be7a640 10 osd.5 32 maybe_update_heartbeat_peers forcing update after 80.455274 seconds
2023-10-11T21:00:46.222+0000 1be7a640 10 osd.5 67 maybe_update_heartbeat_peers forcing update after 80.227428 seconds
2023-10-11T21:02:06.863+0000 d657640 10 osd.5 88 maybe_update_heartbeat_peers forcing update after 80.640751 seconds
2023-10-11T21:03:27.868+0000 d657640 10 osd.5 88 maybe_update_heartbeat_peers forcing update after 81.005320 seconds
2023-10-11T21:04:48.654+0000 d657640 10 osd.5 88 maybe_update_heartbeat_peers forcing update after 80.785916 seconds
2023-10-11T21:06:09.339+0000 d657640 10 osd.5 88 maybe_update_heartbeat_peers forcing update after 80.683903 seconds
2023-10-11T21:07:29.363+0000 d657640 10 osd.5 88 maybe_update_heartbeat_peers forcing update after 80.024916 seconds
2023-10-11T21:08:50.272+0000 d657640 10 osd.5 88 maybe_update_heartbeat_peers forcing update after 80.909281 seconds
2023-10-11T21:10:10.486+0000 d657640 10 osd.5 88 maybe_update_heartbeat_peers forcing update after 80.213215 seconds
2023-10-11T21:11:31.495+0000 d657640 10 osd.5 88 maybe_update_heartbeat_peers forcing update after 81.009558 seconds
2023-10-11T21:12:51.513+0000 d657640 10 osd.5 88 maybe_update_heartbeat_peers forcing update after 80.018505 seconds
2023-10-11T21:14:11.673+0000 d657640 10 osd.5 88 maybe_update_heartbeat_peers forcing update after 80.160123 seconds
2023-10-11T21:15:31.784+0000 d657640 10 osd.5 88 maybe_update_heartbeat_peers forcing update after 80.110673 seconds
2023-10-11T21:16:52.735+0000 d657640 10 osd.5 97 maybe_update_heartbeat_peers forcing update after 80.951408 seconds
2023-10-11T21:18:13.484+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.747420 seconds
2023-10-11T21:19:33.882+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.399638 seconds
2023-10-11T21:20:54.821+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.938116 seconds
2023-10-11T21:22:14.939+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.118154 seconds
2023-10-11T21:23:35.820+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.880716 seconds
2023-10-11T21:24:55.847+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.027235 seconds
2023-10-11T21:26:16.750+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.903677 seconds
2023-10-11T21:27:37.351+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.600174 seconds
2023-10-11T21:28:57.414+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.062907 seconds
2023-10-11T21:30:18.336+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.922363 seconds
2023-10-11T21:31:39.061+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.725610 seconds
2023-10-11T21:32:59.738+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.676204 seconds
2023-10-11T21:34:20.320+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.582562 seconds
2023-10-11T21:35:41.021+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.700942 seconds
2023-10-11T21:37:01.051+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.028715 seconds
2023-10-11T21:38:22.033+0000 d657640 10 osd.5 99 maybe_update_heartbeat_peers forcing update after 80.982889 seconds
2023-10-11T21:39:43.010+0000 d657640 10 osd.5 104 maybe_update_heartbeat_peers forcing update after 80.977477 seconds
2023-10-11T21:41:03.153+0000 d657640 10 osd.5 108 maybe_update_heartbeat_peers forcing update after 80.142961 seconds
2023-10-11T21:42:23.269+0000 d657640 10 osd.5 116 maybe_update_heartbeat_peers forcing update after 80.115488 seconds
2023-10-11T21:43:43.537+0000 d657640 10 osd.5 121 maybe_update_heartbeat_peers forcing update after 80.268765 seconds
2023-10-11T21:45:03.764+0000 d657640 10 osd.5 148 maybe_update_heartbeat_peers forcing update after 80.225319 seconds
2023-10-11T21:46:24.492+0000 d657640 10 osd.5 196 maybe_update_heartbeat_peers forcing update after 80.729479 seconds
2023-10-11T21:48:28.705+0000 d657640 10 osd.5 230 maybe_update_heartbeat_peers forcing update after 124.212376 seconds
2023-10-11T21:49:49.544+0000 d657640 10 osd.5 249 maybe_update_heartbeat_peers forcing update after 80.839358 seconds
2023-10-11T21:51:10.038+0000 d657640 10 osd.5 271 maybe_update_heartbeat_peers forcing update after 80.493641 seconds
2023-10-11T21:52:30.460+0000 1be7a640 10 osd.5 309 maybe_update_heartbeat_peers forcing update after 80.422639 seconds
2023-10-11T21:53:50.890+0000 d657640 10 osd.5 327 maybe_update_heartbeat_peers forcing update after 80.429886 seconds
2023-10-11T21:55:11.124+0000 d657640 10 osd.5 329 maybe_update_heartbeat_peers forcing update after 80.234130 seconds
2023-10-11T21:56:31.534+0000 d657640 10 osd.5 329 maybe_update_heartbeat_peers forcing update after 80.409623 seconds
2023-10-11T21:57:52.332+0000 d657640 10 osd.5 329 maybe_update_heartbeat_peers forcing update after 80.797740 seconds
2023-10-11T21:59:12.746+0000 d657640 10 osd.5 329 maybe_update_heartbeat_peers forcing update after 80.413588 seconds
2023-10-11T22:00:33.304+0000 d657640 10 osd.5 329 maybe_update_heartbeat_peers forcing update after 80.558928 seconds

Actions #9

Updated by Laura Flores 6 months ago

/a/yuriw-2023-11-02_14:20:05-rados-wip-yuri6-testing-2023-11-01-0745-reef-distro-default-smithi/7444593

Actions #10

Updated by Laura Flores 6 months ago

/a/yuriw-2023-11-01_21:37:41-rados-wip-yuri6-testing-2023-11-01-0745-reef-distro-default-smithi/7443819
/a/yuriw-2023-11-01_21:37:41-rados-wip-yuri6-testing-2023-11-01-0745-reef-distro-default-smithi/7443888

Actions #11

Updated by Radoslaw Zarzynski 6 months ago

I talked with Matan about comment #8. I agree we should bump the valgrind timeout and re-verify.

Actions #12

Updated by Laura Flores 6 months ago

/a/yuriw-2023-11-05_15:32:58-rados-reef-release-distro-default-smithi/7448450
/a/yuriw-2023-11-05_15:32:58-rados-reef-release-distro-default-smithi/7448245
/a/yuriw-2023-11-05_15:32:58-rados-reef-release-distro-default-smithi/7448519

Actions #13

Updated by Radoslaw Zarzynski 6 months ago

From /a/yuriw-2023-11-05_15:32:58-rados-reef-release-distro-default-smithi/7448450/teuthology.log:

2023-11-05T23:27:35.898 DEBUG:tasks.ceph_manager:running osd.6 under valgrind with args ['cd', '/home/ubuntu/cephtest', Raw('&&'), 'sudo', 'adjust-ulimits', 'ceph-coverage', '/home/ubuntu/cephtest/archive/coverage', 'daemon-helper', 'term', 'env', 'OPENSSL_ia32cap=~0x1000000000000000', 'valgrind', '--trace-children=no', '--child-silent-after-fork=yes', '--soname-synonyms=somalloc=*tcmalloc*', '--num-callers=50', '--suppressions=/home/ubuntu/cephtest/valgrind.supp', '--xml=yes', '--xml-file=/var/log/ceph/valgrind/osd.6.log', '--time-stamp=yes', '--vgdb=yes', '--exit-on-first-error=yes', '--error-exitcode=42', '--tool=memcheck']
...
2023-11-05T23:34:28.492 INFO:tasks.ceph.osd.6.smithi155.stderr:*** Caught signal (Aborted) **
2023-11-05T23:34:28.492 INFO:tasks.ceph.osd.6.smithi155.stderr: in thread 28330640 thread_name:tp_osd_tp
2023-11-05T23:34:28.543 INFO:tasks.ceph.osd.6.smithi155.stderr: ceph version 18.2.0-1181-gf7e9f9af (f7e9f9af51acf98e0ccc3037487018e4ae40caa5) reef (stable)
2023-11-05T23:34:28.543 INFO:tasks.ceph.osd.6.smithi155.stderr: 1: /lib64/libc.so.6(+0x54db0) [0x549fdb0]
2023-11-05T23:34:28.543 INFO:tasks.ceph.osd.6.smithi155.stderr: 2: /lib64/libc.so.6(+0xa154c) [0x54ec54c]
2023-11-05T23:34:28.543 INFO:tasks.ceph.osd.6.smithi155.stderr: 3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point<ceph::coarse_mono_clock, std::chrono::duration
<unsigned long, std::ratio<1l, 1000000000l> > >)+0x226) [0xb9a736]
2023-11-05T23:34:28.544 INFO:tasks.ceph.osd.6.smithi155.stderr: 4: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >, std::chrono::d
uration<unsigned long, std::ratio<1l, 1000000000l> >)+0x70) [0xb9a830]
2023-11-05T23:34:28.544 INFO:tasks.ceph.osd.6.smithi155.stderr: 5: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x248) [0xbaadb8]
2023-11-05T23:34:28.544 INFO:tasks.ceph.osd.6.smithi155.stderr: 6: ceph-osd(+0xaa3334) [0xbab334]
2023-11-05T23:34:28.544 INFO:tasks.ceph.osd.6.smithi155.stderr: 7: /lib64/libc.so.6(+0x9f802) [0x54ea802]
2023-11-05T23:34:28.544 INFO:tasks.ceph.osd.6.smithi155.stderr: 8: clone()
2023-11-05T23:34:28.609 INFO:tasks.ceph.osd.6.smithi155.stderr:2023-11-05T23:34:28.596+0000 28330640 -1 *** Caught signal (Aborted) **
2023-11-05T23:34:28.610 INFO:tasks.ceph.osd.6.smithi155.stderr: in thread 28330640 thread_name:tp_osd_tp
2023-11-05T23:34:28.610 INFO:tasks.ceph.osd.6.smithi155.stderr:
2023-11-05T23:34:28.610 INFO:tasks.ceph.osd.6.smithi155.stderr: ceph version 18.2.0-1181-gf7e9f9af (f7e9f9af51acf98e0ccc3037487018e4ae40caa5) reef (stable)
2023-11-05T23:34:28.611 INFO:tasks.ceph.osd.6.smithi155.stderr: 1: /lib64/libc.so.6(+0x54db0) [0x549fdb0]
2023-11-05T23:34:28.611 INFO:tasks.ceph.osd.6.smithi155.stderr: 2: /lib64/libc.so.6(+0xa154c) [0x54ec54c]
2023-11-05T23:34:28.611 INFO:tasks.ceph.osd.6.smithi155.stderr: 3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point<ceph::coarse_mono_clock, std::chrono::duration
<unsigned long, std::ratio<1l, 1000000000l> > >)+0x226) [0xb9a736]
2023-11-05T23:34:28.611 INFO:tasks.ceph.osd.6.smithi155.stderr: 4: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >, std::chrono::d
uration<unsigned long, std::ratio<1l, 1000000000l> >)+0x70) [0xb9a830]
2023-11-05T23:34:28.611 INFO:tasks.ceph.osd.6.smithi155.stderr: 5: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x248) [0xbaadb8]
2023-11-05T23:34:28.612 INFO:tasks.ceph.osd.6.smithi155.stderr: 6: ceph-osd(+0xaa3334) [0xbab334]
2023-11-05T23:34:28.612 INFO:tasks.ceph.osd.6.smithi155.stderr: 7: /lib64/libc.so.6(+0x9f802) [0x54ea802]
2023-11-05T23:34:28.612 INFO:tasks.ceph.osd.6.smithi155.stderr: 8: clone()
2023-11-05T23:34:28.612 INFO:tasks.ceph.osd.6.smithi155.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2023-11-05T23:34:28.612 INFO:tasks.ceph.osd.6.smithi155.stderr:

Same story with 7448519 and 7448450:

rzarzynski@teuthology:/a/yuriw-2023-11-05_15:32:58-rados-reef-release-distro-default-smithi/7448519$ less teuthology.log
...
2023-11-06T00:10:49.350 DEBUG:tasks.ceph_manager:running osd.6 under valgrind with args ['cd', '/home/ubuntu/cephtest', Raw('&&'), 'sudo', 'adjust-ulimits', 'ceph-coverage', '/home/ubuntu/cephtest/archive/covera
ge', 'daemon-helper', 'term', 'env', 'OPENSSL_ia32cap=~0x1000000000000000', 'valgrind', '--trace-children=no', '--child-silent-after-fork=yes', '--soname-synonyms=somalloc=*tcmalloc*', '--num-callers=50', '--sup
pressions=/home/ubuntu/cephtest/valgrind.supp', '--xml=yes', '--xml-file=/var/log/ceph/valgrind/osd.6.log', '--time-stamp=yes', '--vgdb=yes', '--exit-on-first-error=yes', '--error-exitcode=42', '--tool=memcheck'
]
...
2023-11-06T00:37:15.354 INFO:tasks.ceph.osd.6.smithi042.stderr:*** Caught signal (Aborted) **
2023-11-06T00:37:15.355 INFO:tasks.ceph.osd.6.smithi042.stderr: in thread 2ba99640 thread_name:tp_osd_tp
2023-11-06T00:37:15.414 INFO:tasks.ceph.osd.6.smithi042.stderr: ceph version 18.2.0-1181-gf7e9f9af (f7e9f9af51acf98e0ccc3037487018e4ae40caa5) reef (stable)
2023-11-06T00:37:15.415 INFO:tasks.ceph.osd.6.smithi042.stderr: 1: /lib64/libc.so.6(+0x54db0) [0x549fdb0]
2023-11-06T00:37:15.415 INFO:tasks.ceph.osd.6.smithi042.stderr: 2: /lib64/libc.so.6(+0xa154c) [0x54ec54c]
2023-11-06T00:37:15.415 INFO:tasks.ceph.osd.6.smithi042.stderr: 3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point<ceph::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >)+0x226) [0xb9a736]
2023-11-06T00:37:15.416 INFO:tasks.ceph.osd.6.smithi042.stderr: 4: (ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >)+0x70) [0xb9a830]
2023-11-06T00:37:15.416 INFO:tasks.ceph.osd.6.smithi042.stderr: 5: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x248) [0xbaadb8]
2023-11-06T00:37:15.416 INFO:tasks.ceph.osd.6.smithi042.stderr: 6: ceph-osd(+0xaa3334) [0xbab334]
2023-11-06T00:37:15.416 INFO:tasks.ceph.osd.6.smithi042.stderr: 7: /lib64/libc.so.6(+0x9f802) [0x54ea802]
2023-11-06T00:37:15.417 INFO:tasks.ceph.osd.6.smithi042.stderr: 8: clone()
Actions #14

Updated by Radoslaw Zarzynski 6 months ago

  • Assignee set to Matan Breizman

Hi Matan! How about sending a PR with the timeout bump up?

Actions #15

Updated by Matan Breizman 6 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 54492
Actions #16

Updated by Matan Breizman 6 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #17

Updated by Backport Bot 6 months ago

Actions #18

Updated by Backport Bot 6 months ago

  • Tags set to backport_processed
Actions #19

Updated by Laura Flores 5 months ago

/a/yuriw-2023-12-05_18:59:03-rados-wip-yuri4-testing-2023-12-04-1129-reef-distro-default-smithi/7478311

2023-12-05T21:49:20.940 INFO:tasks.ceph.osd.7.smithi151.stderr:*** Caught signal (Aborted) **
2023-12-05T21:49:20.940 INFO:tasks.ceph.osd.7.smithi151.stderr: in thread 2572a640 thread_name:tp_osd_tp
2023-12-05T21:49:21.004 INFO:tasks.ceph.osd.7.smithi151.stderr: ceph version 18.2.0-1333-g244b703b (244b703b22e9d7c48e37291bfeaf4b15a97cc628) reef (stable)
2023-12-05T21:49:21.004 INFO:tasks.ceph.osd.7.smithi151.stderr: 1: /lib64/libc.so.6(+0x54db0) [0x549fdb0]
2023-12-05T21:49:21.004 INFO:tasks.ceph.osd.7.smithi151.stderr: 2: /lib64/libc.so.6(+0xa154c) [0x54ec54c]
2023-12-05T21:49:21.004 INFO:tasks.ceph.osd.7.smithi151.stderr: 3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point<ceph::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >)+0x226) [0xb9a3e6]
2023-12-05T21:49:21.004 INFO:tasks.ceph.osd.7.smithi151.stderr: 4: (ceph::HeartbeatMap::clear_timeout(ceph::heartbeat_handle_d*)+0x5f) [0xb9a67f]
2023-12-05T21:49:21.004 INFO:tasks.ceph.osd.7.smithi151.stderr: 5: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x183) [0x6f8cd3]
2023-12-05T21:49:21.004 INFO:tasks.ceph.osd.7.smithi151.stderr: 6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x25b) [0xbaaa7b]
2023-12-05T21:49:21.004 INFO:tasks.ceph.osd.7.smithi151.stderr: 7: ceph-osd(+0xaa2fe4) [0xbaafe4]
2023-12-05T21:49:21.004 INFO:tasks.ceph.osd.7.smithi151.stderr: 8: /lib64/libc.so.6(+0x9f802) [0x54ea802]
2023-12-05T21:49:21.004 INFO:tasks.ceph.osd.7.smithi151.stderr: 9: clone()

Actions #20

Updated by Laura Flores 5 months ago

  • Subject changed from Heartbeat crash in osd to Heartbeat crash in reset_timeout and clear_timeout
Actions #21

Updated by Aishwarya Mathuria 3 months ago

/a/yuriw-2024-02-13_15:50:02-rados-wip-yuri2-testing-2024-02-12-0808-reef-distro-default-smithi/7558344

Actions #22

Updated by Laura Flores 2 months ago

/a/yuriw-2024-02-28_22:39:54-rados-wip-yuri8-testing-2024-02-22-0734-reef-distro-default-smithi/7576292

Actions #23

Updated by Laura Flores 2 months ago

  • Related to Bug #64637: LeakPossiblyLost in BlueStore::_do_write_small() in osd added
Actions #24

Updated by Laura Flores 2 months ago

/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576298

Actions #25

Updated by Laura Flores 2 months ago

/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576314

Actions #26

Updated by Laura Flores 2 months ago

/a/yuriw-2024-02-28_22:53:11-rados-wip-yuri2-testing-2024-02-16-0829-reef-distro-default-smithi/7576311

Actions #27

Updated by Laura Flores about 2 months ago

/a/yuriw-2024-03-04_20:52:58-rados-reef-release-distro-default-smithi/7581448

Actions #29

Updated by Laura Flores 18 days ago ยท Edited

/a/yuriw-2024-04-09_01:16:20-rados-reef-release-distro-default-smithi/7647721
/a/yuriw-2024-04-09_01:16:20-rados-reef-release-distro-default-smithi/7647859
/a/yuriw-2024-04-09_01:16:20-rados-reef-release-distro-default-smithi/7647481

Actions #30

Updated by Laura Flores 11 days ago

/a/yuriw-2024-04-22_18:19:58-rados-wip-yuri2-testing-2024-04-17-0823-reef-distro-default-smithi/7668452

Actions #31

Updated by Matan Breizman 9 days ago

Reef backport is in QA

Actions

Also available in: Atom PDF