Project

General

Profile

Bug #57632

test_envlibrados_for_rocksdb: free(): invalid pointer

Added by Matan Breizman 4 months ago. Updated 14 days ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
test-failure
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/kchai-2022-08-23_13:19:39-rados-wip-kefu-testing-2022-08-22-2243-distro-default-smithi/6987883/

2022-08-23T14:33:00.091 INFO:tasks.workunit.client.0.smithi165.stdout:Run EnvLibrados test
2022-08-23T14:33:00.091 INFO:tasks.workunit.client.0.smithi165.stderr:+ '[' -f ../ceph/src/ceph.conf ']'
2022-08-23T14:33:00.092 INFO:tasks.workunit.client.0.smithi165.stderr:+ cp env_librados_test /home/ubuntu/cephtest/archive
2022-08-23T14:33:00.097 INFO:tasks.workunit.client.0.smithi165.stderr:+ ./env_librados_test
2022-08-23T14:33:00.119 INFO:tasks.workunit.client.0.smithi165.stdout:[==========] Running 16 tests from 2 test cases.
2022-08-23T14:33:00.120 INFO:tasks.workunit.client.0.smithi165.stdout:[----------] Global test environment set-up.
2022-08-23T14:33:00.120 INFO:tasks.workunit.client.0.smithi165.stdout:[----------] 12 tests from EnvLibradosTest
2022-08-23T14:33:00.121 INFO:tasks.workunit.client.0.smithi165.stdout:[ RUN      ] EnvLibradosTest.Basics
2022-08-23T14:33:00.121 INFO:tasks.workunit.client.0.smithi165.stderr:free(): invalid pointer
2022-08-23T14:33:00.136 DEBUG:teuthology.orchestra.run:got remote process result: 134
2022-08-23T14:33:00.137 INFO:tasks.workunit.client.0.smithi165.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test_envlibrados_for_rocksdb.sh: line 96: 21751 Aborted                 (core dumped) ./env_librados_test

Similar failure to: https://tracker.ceph.com/issues/57163
Although compiling rocksdb_env_librados_test with c++20 doensn't resolve this issue.
https://pulpito.ceph.com/matan-2022-09-08_11:12:20-rados:singleton-main-distro-default-smithi/

2022-09-08T13:08:17.905 INFO:tasks.workunit.client.0.smithi089.stderr:+ cmake -DCMAKE_CXX_FLAGS=-std=c++20 -DCMAKE_BUILD_TYPE=Debug -DWITH_TESTS=ON -DWITH_LIBRADOS=ON -DWITH_SNAPPY=ON -DWITH_GFLAGS=OFF -DFAIL_ON_WARNINGS=OFF ..
2022-09-08T13:08:18.000 INFO:tasks.workunit.client.0.smithi089.stdout:-- The CXX compiler identification is GNU 11.1.0
2022-09-08T13:08:18.062 INFO:tasks.workunit.client.0.smithi089.stdout:-- The C compiler identification is GNU 11.1.0
..
2022-09-08T13:10:29.221 INFO:tasks.workunit.client.0.smithi089.stdout:[==========] Running 16 tests from 2 test cases.
2022-09-08T13:10:29.222 INFO:tasks.workunit.client.0.smithi089.stdout:[----------] Global test environment set-up.
2022-09-08T13:10:29.222 INFO:tasks.workunit.client.0.smithi089.stdout:[----------] 12 tests from EnvLibradosTest
2022-09-08T13:10:29.222 INFO:tasks.workunit.client.0.smithi089.stdout:[ RUN      ] EnvLibradosTest.Basics
2022-09-08T13:10:29.223 INFO:tasks.workunit.client.0.smithi089.stderr:free(): invalid pointer


Related issues

Related to Infrastructure - Bug #57754: test_envlibrados_for_rocksdb.sh: update-alternatives: error: alternative path /usr/bin/gcc-11 doesn't exist Closed

History

#1 Updated by Matan Breizman 4 months ago

  • Status changed from New to In Progress
  • Pull request ID set to 48207

#2 Updated by Laura Flores 3 months ago

Ran the test locally in an ubuntu 20.04 environment, and the test ran fine.

There is a coredump located under /a/kchai-2022-08-23_13:19:39-rados-wip-kefu-testing-2022-08-22-2243-distro-default-smithi/6987883/remote/smithi165/coredump, so I will try analyzing it to see what's going on.

#4 Updated by Laura Flores 3 months ago

I followed Brad's ubuntu 20.04 coredump tutorial: https://source.redhat.com/personal_blogs/debugging_a_ceph_osd_coredump_on_ubuntu_2004

Here are some extra steps I had to follow to load the correct rocksdb symbols:

cd ceph-17.0.0-14402-gae0625ac/src/rocksdb

apt-get install g++ libsnappy-dev zlib1g-dev libbz2-dev libradospp-dev cmake

cmake -DCMAKE_BUILD_TYPE=Debug -DWITH_TESTS=ON -DWITH_LIBRADOS=ON -DWITH_SNAPPY=ON -DWITH_GFLAGS=OFF -DFAIL_ON_WARNINGS=OFF

make rocksdb_env_librados_test -j8

gdb /teuthology/kchai-2022-08-23_13:19:39-rados-wip-kefu-testing-2022-08-22-2243-distro-default-smithi/6987883/remote/smithi165/env_librados_test -c /teuthology/kchai-2022-08-23_13:19:39-rados-wip-kefu-testing-2022-08-22-2243-distro-default-smithi/6987883/remote/smithi165/coredump/1661265180.21751.core --directory ceph-17.0.0-14402-gae0625ac/src/rocksdb/utilities

(gdb) set solib-search-path ceph-17.0.0-14402-gae0625ac/src/rocksdb
Reading symbols from /root/ceph-17.0.0-14402-gae0625ac/src/rocksdb/librocksdb.so.6.15.5...

Frames:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f5656e26859 in __GI_abort () at abort.c:79
#2  0x00007f5656e9126e in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f5656fbb298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007f5656e992fc in malloc_printerr (str=str@entry=0x7f5656fb94c1 "free(): invalid pointer") at malloc.c:5347
#4  0x00007f5656e9ab2c in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:4173
#5  0x00007f56570cd8ca in std::locale::_Impl::~_Impl() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f56570cdb17 in std::locale::~locale() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f56584f4384 in std::__detail::_BracketMatcher<std::__cxx11::regex_traits<char>, true, true>::_M_add_collate_element (this=0x7ffe8751b510, Python Exception <class 'OverflowError'> int too big to convert: 
__s=) at /usr/include/c++/9/bits/regex_compiler.h:454
#8  0x00007f56584fcbc8 in std::allocator_traits<std::allocator<std::__cxx11::regex_traits<char>::_RegexMask> >::construct<std::__cxx11::regex_traits<char>::_RegexMask, std::__cxx11::regex_traits<char>::_RegexMask> (__a=..., 
    __p=0x55cdb5584b60) at /usr/include/c++/9/bits/alloc_traits.h:481
#9  0x00007f56584fcc08 in std::operator==<std::__cxx11::regex_traits<char>::_RegexMask*> (__x=..., __y=...) at /usr/include/c++/9/bits/stl_iterator.h:1142
#10 0x00007f56584fcb41 in std::__copy_move_backward<false, true, std::random_access_iterator_tag>::__copy_move_b<long*> (
    __first=0x7f56584fcb41 <std::__copy_move_backward<false, true, std::random_access_iterator_tag>::__copy_move_b<long*>(long* const*, long* const*, long**)+15>, __last=0x7ffe8751b4d0, __result=0x55cdb5584a80)
    at /usr/include/c++/9/bits/stl_algobase.h:577
#11 0x00007f56584fc8dd in std::vector<std::pair<char, char>, std::allocator<std::pair<char, char> > >::max_size (this=0x55cdb54fb2d0) at /usr/include/c++/9/bits/stl_vector.h:921
#12 0x00007f56565ffb4a in ?? () from /usr/lib/ceph/libceph-common.so.2
#13 0x00007f565655afb6 in Option::pre_validate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) const ()
   from /usr/lib/ceph/libceph-common.so.2
#14 0x00007f565653cec5 in md_config_t::md_config_t(ConfigValues&, ConfigTracker const&, bool) () from /usr/lib/ceph/libceph-common.so.2
#15 0x00007f56564e3bdc in ceph::common::CephContext::CephContext(unsigned int, ceph::common::CephContext::create_options const&) () from /usr/lib/ceph/libceph-common.so.2
#16 0x00007f56564e4c61 in ceph::common::CephContext::CephContext(unsigned int, code_environment_t, int) () from /usr/lib/ceph/libceph-common.so.2
#17 0x00007f5656523675 in common_preinit(CephInitParameters const&, code_environment_t, int) () from /usr/lib/ceph/libceph-common.so.2
#18 0x00007f56589df82e in ?? () from /lib/librados.so.2
#19 0x00007f56589dfba9 in rados_create2 () from /lib/librados.so.2
#20 0x00007f5658513c8b in folly::detail::distributed_mutex::doFutexWait<folly::detail::distributed_mutex::Waiter<std::atomic> > (waiter=0x7ffe87525760, next=<error reading variable>)
    at /root/ceph-17.0.0-14402-gae0625ac/src/rocksdb/third-party/folly/folly/synchronization/DistributedMutex-inl.h:1006
#21 0x00007f56585139c4 in folly::detail::distributed_mutex::tryWake<folly::detail::distributed_mutex::Waiter<std::atomic> >(bool, folly::detail::distributed_mutex::Waiter<std::atomic>*, unsigned long, unsigned long, unsigned long, folly::detail::distributed_mutex::Waiter<std::atomic>*&, unsigned long, folly::detail::InlineFunctionRef<void (), 48ul>) (publishing=false, waiter=0x55cdb54b0590, value=140731168741248, next=94341998249360, waker=140008825633220, 
    sleepers=@0x7ffe875257c0: 0x7ffe87525830, iteration=94341982117914, task=...) at /root/ceph-17.0.0-14402-gae0625ac/src/rocksdb/third-party/folly/folly/synchronization/DistributedMutex-inl.h:1448
#22 0x000055cdb4523369 in rocksdb::EnvLibradosTest::EnvLibradosTest (this=0x55cdb54b0560) at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/utilities/env_librados_test.cc:62
#23 0x000055cdb452370a in rocksdb::EnvLibradosTest_Basics_Test::EnvLibradosTest_Basics_Test (this=0x55cdb54b0560) at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/utilities/env_librados_test.cc:110
#24 0x000055cdb4529a82 in testing::internal::TestFactoryImpl<rocksdb::EnvLibradosTest_Basics_Test>::CreateTest (this=0x55cdb5519b80) at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest.h:8359
#25 0x000055cdb455c62f in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::TestFactoryBase, testing::Test*> (object=0x55cdb5519b80, method=&virtual testing::internal::TestFactoryBase::CreateTest(), 
    location=0x55cdb4572068 "the test fixture's constructor") at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3899
#26 0x000055cdb4555c3b in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::TestFactoryBase, testing::Test*> (object=0x55cdb5519b80, method=&virtual testing::internal::TestFactoryBase::CreateTest(), 
    location=0x55cdb4572068 "the test fixture's constructor") at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3935
#27 0x000055cdb4533961 in testing::TestInfo::Run (this=0x55cdb54c89a0) at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4140
#28 0x000055cdb453403c in testing::TestCase::Run (this=0x55cdb54df9d0) at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4267
#29 0x000055cdb453f414 in testing::internal::UnitTestImpl::RunAllTests (this=0x55cdb54b5ea0) at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:6633
#30 0x000055cdb455d973 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x55cdb54b5ea0, 
    method=(bool (testing::internal::UnitTestImpl::*)(class testing::internal::UnitTestImpl * const)) 0x55cdb453f16e <testing::internal::UnitTestImpl::RunAllTests()>, 
    location=0x55cdb45728e0 "auxiliary test code (environments or event listeners)") at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3899
#31 0x000055cdb4556965 in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x55cdb54b5ea0, 
    method=(bool (testing::internal::UnitTestImpl::*)(class testing::internal::UnitTestImpl * const)) 0x55cdb453f16e <testing::internal::UnitTestImpl::RunAllTests()>, 
    location=0x55cdb45728e0 "auxiliary test code (environments or event listeners)") at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3935
#32 0x000055cdb453de14 in testing::UnitTest::Run (this=0x55cdb459b5c0 <testing::UnitTest::GetInstance()::instance>) at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:6242
#33 0x000055cdb4522e62 in RUN_ALL_TESTS () at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/third-party/gtest-1.8.1/fused-src/gtest/gtest.h:22104
#34 0x000055cdb4520d1c in main (argc=1, argv=0x7ffe87525ce8) at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/utilities/env_librados_test.cc:1135

#5 Updated by Laura Flores 3 months ago

Some relevant frames:

(gdb) f 22
#22 0x000055cdb4523369 in rocksdb::EnvLibradosTest::EnvLibradosTest (this=0x55cdb54b0560) at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/utilities/env_librados_test.cc:62
62        : env_(new EnvLibrados(db_name, config, db_pool)) {

#23 0x000055cdb452370a in rocksdb::EnvLibradosTest_Basics_Test::EnvLibradosTest_Basics_Test (this=0x55cdb54b0560) at /home/ubuntu/cephtest/mnt.0/client.0/tmp/rocksdb/utilities/env_librados_test.cc:110
110    TEST_F(EnvLibradosTest, Basics) {

#6 Updated by Laura Flores 3 months ago

There is also a coredump located at `/a/matan-2022-09-08_11:12:20-rados:singleton-main-distro-default-smithi/7020422/remote/smithi089/coredump`. My guess, of course, with the initial failure from Kefu's branch is that we need to compile with c++20. However, in Matan's failed job, he tried compiling with c++20 and it still hit the same failure.

Now that I've looked at the frames from Kefu's job, I'll compare them to Matan's. It's possible that we didn't compile with c++20 as intended. But if we did, then the true issue will likely be more evident in this second core file.

The build is gone, so I'm rebuilding it here: https://shaman.ceph.com/builds/ceph/wip-librados-20-test/48c3af11f46bd793abe5279be20209973ff91090/

I will be following the same steps as above, just on this second core file.

#7 Updated by Laura Flores 3 months ago

I was able to schedule a teuthology run: http://pulpito.front.sepia.ceph.com/lflores-2022-11-16_15:49:13-rados:singleton-main-distro-default-smithi/7084432/

Right now, we're hitting this tracker: https://tracker.ceph.com/issues/57754
But that was easily fixed by installing g++-11, as Matan did in the PR he suggested. I verified that part here: http://pulpito.front.sepia.ceph.com/lflores-2022-11-16_17:03:12-rados:singleton-main-distro-default-smithi/7084435/

So, when we have a solution for the invalid pointer, we can group those two trackers together and knock them both out.

#8 Updated by Laura Flores 3 months ago

  • Related to Bug #57754: test_envlibrados_for_rocksdb.sh: update-alternatives: error: alternative path /usr/bin/gcc-11 doesn't exist added

#9 Updated by Radoslaw Zarzynski 3 months ago

Do we know the reason why switching g++11 helps? Is it a known compiler's bug?

#10 Updated by Matan Breizman 3 months ago

Radoslaw Zarzynski wrote:

Do we know the reason why switching g++11 helps? Is it a known compiler's bug?

See Brad's comment on a similar bug we had: https://tracker.ceph.com/issues/57163#note-13
This was resolved by installing g++11 on Ubuntu and compiling with c++20 to avoid compiler mismatches and passing the test.

#11 Updated by Radoslaw Zarzynski 2 months ago

Thanks for the link, Matan! I'm a bit worried the experiment there involved changing 2 parameters the same: compiler version and language dialect. How about trying to replicate it with newer compiler BUT older dialect? If the dialect is culprit, we should see the problem again.

#12 Updated by Laura Flores 2 months ago

@Radek I have been trying to reproduce this locally with no luck. I'll try your suggestion and update if I'm successful.

#13 Updated by Laura Flores 2 months ago

  • Tags set to test-failure
  • Assignee changed from Matan Breizman to Laura Flores

We discussed this tracker in the RADOS meeting. Sam pointed out that this set of tests doesn't have any actual users, and it is experimental. So, we all agreed it would be okay to remove ubuntu coverage here.

I will open the PR for this.

#14 Updated by Laura Flores 2 months ago

  • Pull request ID changed from 48207 to 49181

#15 Updated by Laura Flores 2 months ago

Linked a possible solution for skipping ubuntu with this test. I scheduled a teuthology test for it, which I will use to validate the fix. If all goes well, I will mark it ready for review.

#16 Updated by Laura Flores about 2 months ago

  • Status changed from In Progress to Fix Under Review

#17 Updated by Laura Flores 14 days ago

  • Status changed from Fix Under Review to Closed

I'm going to "close" this since my PR was more of a workaround rather than a true solution.

Also available in: Atom PDF