Project

General

Profile

Bug #49240

terminate called after throwing an instance of 'std::bad_alloc'

Added by Neha Ojha 3 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-02-10T00:05:29.349 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: terminate called after throwing an instance of 'std::bad_alloc'
2021-02-10T00:05:29.349 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:   what():  std::bad_alloc
2021-02-10T00:05:29.349 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: *** Caught signal (Aborted) **
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  in thread 7f67682a8700 thread_name:ms_dispatch
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  ceph version 17.0.0-638-g8bf6cf6e (8bf6cf6ec50e9b5f8323a7750b68c287b546028c) quincy (dev)
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  1: /lib64/libpthread.so.0(+0x12b20) [0x7f67706bab20]
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  2: gsignal()
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  3: abort()
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  4: /lib64/libstdc++.so.6(+0x9009b) [0x7f676fac109b]
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  5: /lib64/libstdc++.so.6(+0x9653c) [0x7f676fac753c]
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  6: /lib64/libstdc++.so.6(+0x95559) [0x7f676fac6559]
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  7: __gxx_personality_v0()
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  8: /lib64/libgcc_s.so.1(+0x10b13) [0x7f676f4a7b13]
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  9: _Unwind_Resume()
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  10: /usr/lib64/ceph/libceph-common.so.2(+0x294eb1) [0x7f6771aedeb1]
2021-02-10T00:05:29.351 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  11: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f6771db52b1]
2021-02-10T00:05:29.351 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  12: (Thread::_entry_func(void*)+0xd) [0x7f6771bb602d]
2021-02-10T00:05:29.351 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  13: /lib64/libpthread.so.0(+0x814a) [0x7f67706b014a]
2021-02-10T00:05:29.351 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]:  14: clone()

/a/yuriw-2021-02-09_22:48:58-rados-wip-yuri8-testing-2021-02-08-0950-distro-basic-smithi/5872139

rados/thrash-old-clients/{0-size-min-size-overrides/2-size-2-min-size 1-install/nautilus-v1only backoff/normal ceph clusters/{openstack three-plus-one} d-balancer/crush-compat distro$/{ubuntu_18.04} mon_election/classic msgr-failures/fastclose rados thrashers/mapgap thrashosds-health workloads/cache-snaps}


Related issues

Related to rgw - Bug #49387: several crashes from bad_alloc exceptions Resolved
Related to bluestore - Bug #49394: another terminate called after throwing an instance of 'std::bad_alloc' Resolved

History

#1 Updated by Neha Ojha 3 months ago

This one is in the osd.

2021-02-13T05:11:50.695 INFO:tasks.ceph.osd.1.smithi165.stderr:terminate called after throwing an instance of 'std::bad_alloc'
2021-02-13T05:11:50.696 INFO:tasks.ceph.osd.1.smithi165.stderr:  what():  std::bad_alloc
2021-02-13T05:11:50.696 INFO:tasks.ceph.osd.1.smithi165.stderr:*** Caught signal (Aborted) **
2021-02-13T05:11:50.696 INFO:tasks.ceph.osd.1.smithi165.stderr: in thread 7f2a69df7700 thread_name:tp_osd_tp
2021-02-13T05:11:50.705 INFO:tasks.ceph.osd.1.smithi165.stderr: ceph version 17.0.0-743-g27a6c46f (27a6c46f8accb618f19d0e3136f48cb72da295f8) quincy (dev)
2021-02-13T05:11:50.705 INFO:tasks.ceph.osd.1.smithi165.stderr: 1: /lib64/libpthread.so.0(+0x12dc0) [0x7f2a8ecc1dc0]
2021-02-13T05:11:50.705 INFO:tasks.ceph.osd.1.smithi165.stderr: 2: gsignal()
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 3: abort()
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 4: /lib64/libstdc++.so.6(+0x9006b) [0x7f2a8e2e406b]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 5: /lib64/libstdc++.so.6(+0x9650c) [0x7f2a8e2ea50c]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 6: /lib64/libstdc++.so.6(+0x95529) [0x7f2a8e2e9529]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 7: __gxx_personality_v0()
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 8: /lib64/libgcc_s.so.1(+0x10b13) [0x7f2a8dccab13]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 9: _Unwind_Resume()
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 10: ceph-osd(+0x56712e) [0x5645dbea812e]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x5645dc6354e4]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 12: (Thread::_entry_func(void*)+0xd) [0x5645dc62434d]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 13: /lib64/libpthread.so.0(+0x82de) [0x7f2a8ecb72de]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 14: clone()

rados/singleton/{all/radostool mon_election/classic msgr-failures/few msgr/async-v2only objectstore/bluestore-comp-zstd rados supported-random-distro$/{centos_8}}

/a/nojha-2021-02-13_01:15:20-rados-master-distro-basic-smithi/5878639

#2 Updated by Casey Bodley 3 months ago

  • Related to Bug #49387: several crashes from bad_alloc exceptions added

#3 Updated by Casey Bodley 3 months ago

Hi Neha, the rgw suites recently started seeing radosgw crashes from bad_alloc exceptions as well.

#4 Updated by Neha Ojha 3 months ago

  • Priority changed from Normal to Urgent

#5 Updated by Sebastian Wagner 3 months ago

  • Related to Bug #49190: LibRadosMiscConnectFailure_ConnectFailure_Test: FAILED ceph_assert(p != obs_call_gate.end()) added

#6 Updated by Sebastian Wagner 3 months ago

  • Related to deleted (Bug #49190: LibRadosMiscConnectFailure_ConnectFailure_Test: FAILED ceph_assert(p != obs_call_gate.end()))

#7 Updated by Josh Durgin 3 months ago

Is this only happening on rpm-based systems? We recently started requiring tcmalloc 2.8 there: https://github.com/ceph/ceph/pull/39379/files

#8 Updated by Neha Ojha 3 months ago

  • Related to Bug #49394: another terminate called after throwing an instance of 'std::bad_alloc' added

#9 Updated by Neha Ojha 3 months ago

  • Description updated (diff)

#10 Updated by Neha Ojha 3 months ago

  • Priority changed from Urgent to Immediate

#11 Updated by Neha Ojha 3 months ago

  • Backport set to pacific

#12 Updated by Neha Ojha 3 months ago

Josh Durgin wrote:

Is this only happening on rpm-based systems? We recently started requiring tcmalloc 2.8 there: https://github.com/ceph/ceph/pull/39379/files

This is appearing in pacific as well, where this tcmalloc change hasn't merged. I have seen this once on ubuntu 18.04.

#13 Updated by Sage Weil 2 months ago

2021-02-23T22:41:50.598 INFO:teuthology.orchestra.run.smithi063.stderr:2021-02-23T22:41:50.459+0000 7f1491ace700 -1  ceph_test_msgr reply_message conn=0x55655cf14c00 reply m=0x55654ff4a000 i=1741
2021-02-23T22:41:50.599 INFO:teuthology.orchestra.run.smithi063.stderr:2021-02-23T22:41:50.459+0000 7f1491ace700 -1  ceph_test_msgr ms_fast_dispatch conn=0x55655cf14c00reply=^@ i = 1742
2021-02-23T22:41:50.599 INFO:teuthology.orchestra.run.smithi063.stderr:2021-02-23T22:41:50.459+0000 7f1491ace700 -1  ceph_test_msgr reply_message conn=0x55655cf14c00 reply m=0x55654ff4a000 i=1742
2021-02-23T22:41:50.599 INFO:teuthology.orchestra.run.smithi063.stdout:unknown file: Failure
2021-02-23T22:41:50.599 INFO:teuthology.orchestra.run.smithi063.stdout:C++ exception with description "Bad allocation" thrown in the test body.
2021-02-23T22:41:50.600 INFO:teuthology.orchestra.run.smithi063.stdout:[  FAILED  ] Messenger/MessengerTest.SyntheticStressTest/0, where GetParam() = "async+posix" (1626 ms)

/a/sage-2021-02-23_06:29:23-rados-wip-sage-testing-2021-02-22-2228-distro-basic-smithi/5906299
description: rados/singleton-nomsgr/{all/msgr mon_election/classic rados supported-random-distro$/{rhel_8}}

#14 Updated by Neha Ojha 2 months ago

Using https://tracker.ceph.com/issues/49240#note-1, fails 1/10 times

rados:singleton/{all/radostool mon_election/classic msgr-failures/few msgr/async-v2only objectstore/bluestore-bitmap rados supported-random-distro$/{centos_8}}

https://pulpito.ceph.com/nojha-2021-02-23_23:58:28-rados:singleton-pacific-distro-basic-smithi/

#15 Updated by Brad Hubbard 2 months ago

Per #49387 (and an email from Casey) could be an issue with the tcmalloc version.

#16 Updated by Neha Ojha 2 months ago

I am not able to reproduce the following (only occurrence of bad_alloc on ubuntu) on master.

/a/yuriw-2021-02-09_22:48:58-rados-wip-yuri8-testing-2021-02-08-0950-distro-basic-smithi/5872139

rados/thrash-old-clients/{0-size-min-size-overrides/2-size-2-min-size 1-install/nautilus-v1only backoff/normal ceph clusters/{openstack three-plus-one} d-balancer/crush-compat distro$/{ubuntu_18.04} mon_election/classic msgr-failures/fastclose rados thrashers/mapgap thrashosds-health workloads/cache-snaps}

https://pulpito.ceph.com/nojha-2021-02-25_23:47:10-rados:thrash-old-clients-master-distro-basic-smithi/

#17 Updated by Sage Weil 2 months ago

/a/sage-2021-02-28_18:35:15-rados-wip-sage-testing-2021-02-28-1217-distro-basic-smithi/5921574

#18 Updated by Ken Dreyer 2 months ago

I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1933792 to track removing gperftools 2.8 from EPEL 8 and going back to the last 2.7 build.

#19 Updated by Josh Durgin about 2 months ago

  • Status changed from New to Resolved

EPEL has tcmalloc 2.7 again, which fixes this.

Also available in: Atom PDF