Bug #49240
closedterminate called after throwing an instance of 'std::bad_alloc'
0%
Description
2021-02-10T00:05:29.349 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: terminate called after throwing an instance of 'std::bad_alloc' 2021-02-10T00:05:29.349 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: what(): std::bad_alloc 2021-02-10T00:05:29.349 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: *** Caught signal (Aborted) ** 2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: in thread 7f67682a8700 thread_name:ms_dispatch 2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: ceph version 17.0.0-638-g8bf6cf6e (8bf6cf6ec50e9b5f8323a7750b68c287b546028c) quincy (dev) 2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 1: /lib64/libpthread.so.0(+0x12b20) [0x7f67706bab20] 2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 2: gsignal() 2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 3: abort() 2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 4: /lib64/libstdc++.so.6(+0x9009b) [0x7f676fac109b] 2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 5: /lib64/libstdc++.so.6(+0x9653c) [0x7f676fac753c] 2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 6: /lib64/libstdc++.so.6(+0x95559) [0x7f676fac6559] 2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 7: __gxx_personality_v0() 2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 8: /lib64/libgcc_s.so.1(+0x10b13) [0x7f676f4a7b13] 2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 9: _Unwind_Resume() 2021-02-10T00:05:29.350 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 10: /usr/lib64/ceph/libceph-common.so.2(+0x294eb1) [0x7f6771aedeb1] 2021-02-10T00:05:29.351 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 11: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f6771db52b1] 2021-02-10T00:05:29.351 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 12: (Thread::_entry_func(void*)+0xd) [0x7f6771bb602d] 2021-02-10T00:05:29.351 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 13: /lib64/libpthread.so.0(+0x814a) [0x7f67706b014a] 2021-02-10T00:05:29.351 INFO:journalctl@ceph.mgr.x.smithi093.stdout:Feb 10 00:05:27 smithi093 bash[13368]: 14: clone()
/a/yuriw-2021-02-09_22:48:58-rados-wip-yuri8-testing-2021-02-08-0950-distro-basic-smithi/5872139
rados/thrash-old-clients/{0-size-min-size-overrides/2-size-2-min-size 1-install/nautilus-v1only backoff/normal ceph clusters/{openstack three-plus-one} d-balancer/crush-compat distro$/{ubuntu_18.04} mon_election/classic msgr-failures/fastclose rados thrashers/mapgap thrashosds-health workloads/cache-snaps}
Updated by Neha Ojha about 3 years ago
This one is in the osd.
2021-02-13T05:11:50.695 INFO:tasks.ceph.osd.1.smithi165.stderr:terminate called after throwing an instance of 'std::bad_alloc' 2021-02-13T05:11:50.696 INFO:tasks.ceph.osd.1.smithi165.stderr: what(): std::bad_alloc 2021-02-13T05:11:50.696 INFO:tasks.ceph.osd.1.smithi165.stderr:*** Caught signal (Aborted) ** 2021-02-13T05:11:50.696 INFO:tasks.ceph.osd.1.smithi165.stderr: in thread 7f2a69df7700 thread_name:tp_osd_tp 2021-02-13T05:11:50.705 INFO:tasks.ceph.osd.1.smithi165.stderr: ceph version 17.0.0-743-g27a6c46f (27a6c46f8accb618f19d0e3136f48cb72da295f8) quincy (dev) 2021-02-13T05:11:50.705 INFO:tasks.ceph.osd.1.smithi165.stderr: 1: /lib64/libpthread.so.0(+0x12dc0) [0x7f2a8ecc1dc0] 2021-02-13T05:11:50.705 INFO:tasks.ceph.osd.1.smithi165.stderr: 2: gsignal() 2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 3: abort() 2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 4: /lib64/libstdc++.so.6(+0x9006b) [0x7f2a8e2e406b] 2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 5: /lib64/libstdc++.so.6(+0x9650c) [0x7f2a8e2ea50c] 2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 6: /lib64/libstdc++.so.6(+0x95529) [0x7f2a8e2e9529] 2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 7: __gxx_personality_v0() 2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 8: /lib64/libgcc_s.so.1(+0x10b13) [0x7f2a8dccab13] 2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 9: _Unwind_Resume() 2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 10: ceph-osd(+0x56712e) [0x5645dbea812e] 2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x5645dc6354e4] 2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 12: (Thread::_entry_func(void*)+0xd) [0x5645dc62434d] 2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 13: /lib64/libpthread.so.0(+0x82de) [0x7f2a8ecb72de] 2021-02-13T05:11:50.706 INFO:tasks.ceph.osd.1.smithi165.stderr: 14: clone()
rados/singleton/{all/radostool mon_election/classic msgr-failures/few msgr/async-v2only objectstore/bluestore-comp-zstd rados supported-random-distro$/{centos_8}}
/a/nojha-2021-02-13_01:15:20-rados-master-distro-basic-smithi/5878639
Updated by Casey Bodley about 3 years ago
- Related to Bug #49387: several crashes from bad_alloc exceptions added
Updated by Casey Bodley about 3 years ago
Hi Neha, the rgw suites recently started seeing radosgw crashes from bad_alloc exceptions as well.
Updated by Sebastian Wagner about 3 years ago
- Related to Bug #49190: LibRadosMiscConnectFailure_ConnectFailure_Test: FAILED ceph_assert(p != obs_call_gate.end()) added
Updated by Sebastian Wagner about 3 years ago
- Related to deleted (Bug #49190: LibRadosMiscConnectFailure_ConnectFailure_Test: FAILED ceph_assert(p != obs_call_gate.end()))
Updated by Josh Durgin about 3 years ago
Is this only happening on rpm-based systems? We recently started requiring tcmalloc 2.8 there: https://github.com/ceph/ceph/pull/39379/files
Updated by Neha Ojha about 3 years ago
- Related to Bug #49394: another terminate called after throwing an instance of 'std::bad_alloc' added
Updated by Neha Ojha about 3 years ago
- Priority changed from Urgent to Immediate
Updated by Neha Ojha about 3 years ago
Josh Durgin wrote:
Is this only happening on rpm-based systems? We recently started requiring tcmalloc 2.8 there: https://github.com/ceph/ceph/pull/39379/files
This is appearing in pacific as well, where this tcmalloc change hasn't merged. I have seen this once on ubuntu 18.04.
Updated by Sage Weil about 3 years ago
2021-02-23T22:41:50.598 INFO:teuthology.orchestra.run.smithi063.stderr:2021-02-23T22:41:50.459+0000 7f1491ace700 -1 ceph_test_msgr reply_message conn=0x55655cf14c00 reply m=0x55654ff4a000 i=1741 2021-02-23T22:41:50.599 INFO:teuthology.orchestra.run.smithi063.stderr:2021-02-23T22:41:50.459+0000 7f1491ace700 -1 ceph_test_msgr ms_fast_dispatch conn=0x55655cf14c00reply=^@ i = 1742 2021-02-23T22:41:50.599 INFO:teuthology.orchestra.run.smithi063.stderr:2021-02-23T22:41:50.459+0000 7f1491ace700 -1 ceph_test_msgr reply_message conn=0x55655cf14c00 reply m=0x55654ff4a000 i=1742 2021-02-23T22:41:50.599 INFO:teuthology.orchestra.run.smithi063.stdout:unknown file: Failure 2021-02-23T22:41:50.599 INFO:teuthology.orchestra.run.smithi063.stdout:C++ exception with description "Bad allocation" thrown in the test body. 2021-02-23T22:41:50.600 INFO:teuthology.orchestra.run.smithi063.stdout:[ FAILED ] Messenger/MessengerTest.SyntheticStressTest/0, where GetParam() = "async+posix" (1626 ms)
/a/sage-2021-02-23_06:29:23-rados-wip-sage-testing-2021-02-22-2228-distro-basic-smithi/5906299
description: rados/singleton-nomsgr/{all/msgr mon_election/classic rados supported-random-distro$/{rhel_8}}
Updated by Neha Ojha about 3 years ago
Using https://tracker.ceph.com/issues/49240#note-1, fails 1/10 times
rados:singleton/{all/radostool mon_election/classic msgr-failures/few msgr/async-v2only objectstore/bluestore-bitmap rados supported-random-distro$/{centos_8}}
https://pulpito.ceph.com/nojha-2021-02-23_23:58:28-rados:singleton-pacific-distro-basic-smithi/
Updated by Brad Hubbard about 3 years ago
Per #49387 (and an email from Casey) could be an issue with the tcmalloc version.
Updated by Neha Ojha about 3 years ago
I am not able to reproduce the following (only occurrence of bad_alloc on ubuntu) on master.
/a/yuriw-2021-02-09_22:48:58-rados-wip-yuri8-testing-2021-02-08-0950-distro-basic-smithi/5872139
rados/thrash-old-clients/{0-size-min-size-overrides/2-size-2-min-size 1-install/nautilus-v1only backoff/normal ceph clusters/{openstack three-plus-one} d-balancer/crush-compat distro$/{ubuntu_18.04} mon_election/classic msgr-failures/fastclose rados thrashers/mapgap thrashosds-health workloads/cache-snaps}
Updated by Sage Weil about 3 years ago
/a/sage-2021-02-28_18:35:15-rados-wip-sage-testing-2021-02-28-1217-distro-basic-smithi/5921574
Updated by Ken Dreyer about 3 years ago
I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1933792 to track removing gperftools 2.8 from EPEL 8 and going back to the last 2.7 build.
Updated by Josh Durgin about 3 years ago
- Status changed from New to Resolved
EPEL has tcmalloc 2.7 again, which fixes this.