Bug #49240
closed
terminate called after throwing an instance of 'std::bad_alloc'
Added by Neha Ojha about 3 years ago.
Updated about 3 years ago.
Description
2021-02-10T00:05:29.349 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: terminate called after throwing an instance of 'std::bad_alloc'
2021-02-10T00:05:29.349 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: what(): std::bad_alloc
2021-02-10T00:05:29.349 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: *** Caught signal (Aborted) **
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: in thread 7f67682a8700 thread_name:ms_dispatch
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: ceph version 17.0.0-638-g8bf6cf6e (8bf6cf6ec50e9b5f8323a7750b68c287b546028c) quincy (dev)
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 1: /lib64/libpthread.so.0(+0x12b20) [0x7f67706bab20]
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 2: gsignal()
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 3: abort()
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 4: /lib64/libstdc++.so.6(+0x9009b) [0x7f676fac109b]
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 5: /lib64/libstdc++.so.6(+0x9653c) [0x7f676fac753c]
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 6: /lib64/libstdc++.so.6(+0x95559) [0x7f676fac6559]
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 7: __gxx_personality_v0()
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 8: /lib64/libgcc_s.so.1(+0x10b13) [0x7f676f4a7b13]
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 9: _Unwind_Resume()
2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 10: /usr/lib64/ceph/libceph-common.so.2(+0x294eb1) [0x7f6771aedeb1]
2021-02-10T00:05:29.351 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 11: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f6771db52b1]
2021-02-10T00:05:29.351 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 12: (Thread::_entry_func(void*)+0xd) [0x7f6771bb602d]
2021-02-10T00:05:29.351 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 13: /lib64/libpthread.so.0(+0x814a) [0x7f67706b014a]
2021-02-10T00:05:29.351 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 14: clone()
/a/yuriw-2021-02-09_22:48:58-rados-wip-yuri8-testing-2021-02-08-0950-distro-basic-smithi/5872139
rados/thrash-old-clients/{0-size-min-size-overrides/2-size-2-min-size 1-install/nautilus-v1only backoff/normal ceph clusters/{openstack three-plus-one} d-balancer/crush-compat distro$/{ubuntu_18.04} mon_election/classic msgr-failures/fastclose rados thrashers/mapgap thrashosds-health workloads/cache-snaps}
This one is in the osd.
2021-02-13T05:11:50.695 INFO:tasks.ceph.osd.1.smithi165.stderr:terminate called after throwing an instance of 'std::bad_alloc'
2021-02-13T05:11:50.696 INFO:tasks.ceph.osd.1.smithi165.stderr: what(): std::bad_alloc
2021-02-13T05:11:50.696 INFO:tasks.ceph.osd.1.smithi165.stderr:*** Caught signal (Aborted) **
2021-02-13T05:11:50.696 INFO:tasks.ceph.osd.1.smithi165.stderr: in thread 7f2a69df7700 thread_name:tp_osd_tp
2021-02-13T05:11:50.705 INFO:tasks.ceph.osd.1.smithi165.stderr: ceph version 17.0.0-743-g27a6c46f (27a6c46f8accb618f19d0e3136f48cb72da295f8) quincy (dev)
2021-02-13T05:11:50.705 INFO:tasks.ceph.osd.1.smithi165.stderr: 1: /lib64/libpthread.so.0(+0x12dc0) [0x7f2a8ecc1dc0]
2021-02-13T05:11:50.705 INFO:tasks.ceph.osd.1.smithi165.stderr: 2: gsignal()
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 3: abort()
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 4: /lib64/libstdc++.so.6(+0x9006b) [0x7f2a8e2e406b]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 5: /lib64/libstdc++.so.6(+0x9650c) [0x7f2a8e2ea50c]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 6: /lib64/libstdc++.so.6(+0x95529) [0x7f2a8e2e9529]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 7: __gxx_personality_v0()
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 8: /lib64/libgcc_s.so.1(+0x10b13) [0x7f2a8dccab13]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 9: _Unwind_Resume()
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 10: ceph-osd(+0x56712e) [0x5645dbea812e]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x5645dc6354e4]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 12: (Thread::_entry_func(void*)+0xd) [0x5645dc62434d]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 13: /lib64/libpthread.so.0(+0x82de) [0x7f2a8ecb72de]
2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 14: clone()
rados/singleton/{all/radostool mon_election/classic msgr-failures/few msgr/async-v2only objectstore/bluestore-comp-zstd rados supported-random-distro$/{centos_8}}
/a/nojha-2021-02-13_01:15:20-rados-master-distro-basic-smithi/5878639
- Related to Bug #49387: several crashes from bad_alloc exceptions added
Hi Neha, the rgw suites recently started seeing radosgw crashes from bad_alloc exceptions as well.
- Priority changed from Normal to Urgent
- Related to Bug #49190: LibRadosMiscConnectFailure_ConnectFailure_Test: FAILED ceph_assert(p != obs_call_gate.end()) added
- Related to deleted (Bug #49190: LibRadosMiscConnectFailure_ConnectFailure_Test: FAILED ceph_assert(p != obs_call_gate.end()))
- Related to Bug #49394: another terminate called after throwing an instance of 'std::bad_alloc' added
- Description updated (diff)
- Priority changed from Urgent to Immediate
Josh Durgin wrote:
Is this only happening on rpm-based systems? We recently started requiring tcmalloc 2.8 there: https://github.com/ceph/ceph/pull/39379/files
This is appearing in pacific as well, where this tcmalloc change hasn't merged. I have seen this once on ubuntu 18.04.
2021-02-23T22:41:50.598 INFO:teuthology.orchestra.run.smithi063.stderr:2021-02-23T22:41:50.459+0000 7f1491ace700 -1 ceph_test_msgr reply_message conn=0x55655cf14c00 reply m=0x55654ff4a000 i=1741
2021-02-23T22:41:50.599 INFO:teuthology.orchestra.run.smithi063.stderr:2021-02-23T22:41:50.459+0000 7f1491ace700 -1 ceph_test_msgr ms_fast_dispatch conn=0x55655cf14c00reply=^@ i = 1742
2021-02-23T22:41:50.599 INFO:teuthology.orchestra.run.smithi063.stderr:2021-02-23T22:41:50.459+0000 7f1491ace700 -1 ceph_test_msgr reply_message conn=0x55655cf14c00 reply m=0x55654ff4a000 i=1742
2021-02-23T22:41:50.599 INFO:teuthology.orchestra.run.smithi063.stdout:unknown file: Failure
2021-02-23T22:41:50.599 INFO:teuthology.orchestra.run.smithi063.stdout:C++ exception with description "Bad allocation" thrown in the test body.
2021-02-23T22:41:50.600 INFO:teuthology.orchestra.run.smithi063.stdout:[ FAILED ] Messenger/MessengerTest.SyntheticStressTest/0, where GetParam() = "async+posix" (1626 ms)
/a/sage-2021-02-23_06:29:23-rados-wip-sage-testing-2021-02-22-2228-distro-basic-smithi/5906299
description: rados/singleton-nomsgr/{all/msgr mon_election/classic rados supported-random-distro$/{rhel_8}}
Per #49387 (and an email from Casey) could be an issue with the tcmalloc version.
I am not able to reproduce the following (only occurrence of bad_alloc on ubuntu) on master.
/a/yuriw-2021-02-09_22:48:58-rados-wip-yuri8-testing-2021-02-08-0950-distro-basic-smithi/5872139
rados/thrash-old-clients/{0-size-min-size-overrides/2-size-2-min-size 1-install/nautilus-v1only backoff/normal ceph clusters/{openstack three-plus-one} d-balancer/crush-compat distro$/{ubuntu_18.04} mon_election/classic msgr-failures/fastclose rados thrashers/mapgap thrashosds-health workloads/cache-snaps}
https://pulpito.ceph.com/nojha-2021-02-25_23:47:10-rados:thrash-old-clients-master-distro-basic-smithi/
/a/sage-2021-02-28_18:35:15-rados-wip-sage-testing-2021-02-28-1217-distro-basic-smithi/5921574
- Status changed from New to Resolved
EPEL has tcmalloc 2.7 again, which fixes this.
Also available in: Atom
PDF