Bug #57050
closedCrash on startup of radosgw in librbd::rbd_features_from_string()
0%
Description
Mosts tests are crashing in this teuthology run on a branch based on recent main:
https://pulpito.ceph.com/dang-2022-08-05_09:33:20-rgw-wip-dang-zipper-cleanup-distro-default-smithi/
The crash is in early startup, so there's no logs. There is a core file, with the following backtrace:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f074b02c3ce in std::locale::operator==(std::locale const&) const () from /usr/lib/ceph/libceph-common.so.2
(gdb) bt
#0 0x00007f074b02c3ce in std::locale::operator==(std::locale const&) const () from /usr/lib/ceph/libceph-common.so.2
#1 0x00007f074aeed778 in boost::detail::lcast_ret_unsigned<std::char_traits<char>, unsigned long, char>::convert() () from /usr/lib/ceph/libceph-common.so.2
#2 0x00007f074aeec9c4 in librbd::rbd_features_from_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*) () from /usr/lib/ceph/libceph-common.so.2
#3 0x00007f074ab767df in ?? () from /usr/lib/ceph/libceph-common.so.2
#4 0x00007f074aad2b06 in Option::pre_validate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) const () from /usr/lib/ceph/libceph-common.so.2
#5 0x00007f074aad511b in Option::parse_value(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::variant<std::monostate, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long, long, double, bool, entity_addr_t, entity_addrvec_t, std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000l> >, Option::size_t, uuid_d>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) const () from /usr/lib/ceph/libceph-common.so.2
#6 0x00007f074aaa4912 in md_config_t::_set_val(ConfigValues&, ConfigTracker const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Option const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /usr/lib/ceph/libceph-common.so.2
#7 0x00007f074aaa4d87 in md_config_t::set_val_default(ConfigValues&, ConfigTracker const&, std::basic_string_view<char, std::char_traits<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
from /usr/lib/ceph/libceph-common.so.2
#8 0x00007f074aab484a in md_config_t::md_config_t(ConfigValues&, ConfigTracker const&, bool) () from /usr/lib/ceph/libceph-common.so.2
#9 0x00007f074aa5b4cc in ceph::common::CephContext::CephContext(unsigned int, ceph::common::CephContext::create_options const&) () from /usr/lib/ceph/libceph-common.so.2
#10 0x00007f074aa5c551 in ceph::common::CephContext::CephContext(unsigned int, code_environment_t, int) () from /usr/lib/ceph/libceph-common.so.2
#11 0x00007f074aa9aff5 in common_preinit(CephInitParameters const&, code_environment_t, int) () from /usr/lib/ceph/libceph-common.so.2
#12 0x00007f074cb55228 in global_pre_init (defaults=defaults@entry=0x7fffa2618190, args=std::vector of length 9, capacity 13 = {...}, module_type=module_type@entry=8, code_env=code_env@entry=CODE_ENVIRONMENT_DAEMON, flags=flags@entry=17) at ./src/global/global_init.cc:114
#13 0x00007f074c40cda5 in rgw_global_init (defaults=defaults@entry=0x7fffa2618190, args=std::vector of length 9, capacity 13 = {...}, module_type=module_type@entry=8, code_env=code_env@entry=CODE_ENVIRONMENT_DAEMON, flags=flags@entry=17) at ./src/rgw/rgw_common.cc:3037
#14 0x00007f074c2eccfa in radosgw_Main (argc=14, argv=0x7fffa2618968) at ./src/rgw/rgw_main.cc:250
#15 0x00007f074b71a083 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x0000000000000000 in ?? ()
Updated by Casey Bodley over 1 year ago
very strange, seems to be crashing on
features = boost::lexical_cast<uint64_t>(value);
we recently updated both the boost version and the c++ version
if rbd isn't seeing behavior like this, we may need to dig into rgw's linkage - maybe it's somehow different than libceph-common.so
Updated by Casey Bodley over 1 year ago
- Project changed from rbd to rgw
Dan's results in https://pulpito.ceph.com/dang-2022-08-05_09:33:20-rgw-wip-dang-zipper-cleanup-distro-default-smithi/ show startup crashes under both ubuntu and centos, but the centos ones look different:
2022-08-05T14:07:40.358 INFO:tasks.rgw.client.0.smithi120.stdout:terminate called after throwing an instance of 'std::bad_variant_access' 2022-08-05T14:07:40.358 INFO:tasks.rgw.client.0.smithi120.stdout: what(): std::get: wrong index for variant 2022-08-05T14:07:40.358 INFO:tasks.rgw.client.0.smithi120.stdout:*** Caught signal (Aborted) ** 2022-08-05T14:07:40.359 INFO:tasks.rgw.client.0.smithi120.stdout: in thread 7f4c1583e600 thread_name:radosgw 2022-08-05T14:07:40.359 INFO:tasks.rgw.client.0.smithi120.stdout: ceph version 17.0.0-14041-g1b25ebe8 (1b25ebe881f4e3cb2c720c7d9794e3c6072a600a) quincy (dev) 2022-08-05T14:07:40.360 INFO:tasks.rgw.client.0.smithi120.stdout: 1: /lib64/libpthread.so.0(+0x12ce0) [0x7f4c1ddd5ce0] 2022-08-05T14:07:40.360 INFO:tasks.rgw.client.0.smithi120.stdout: 2: gsignal() 2022-08-05T14:07:40.360 INFO:tasks.rgw.client.0.smithi120.stdout: 3: abort() 2022-08-05T14:07:40.360 INFO:tasks.rgw.client.0.smithi120.stdout: 4: /lib64/libstdc++.so.6(+0x9009b) [0x7f4c1ac1f09b] 2022-08-05T14:07:40.361 INFO:tasks.rgw.client.0.smithi120.stdout: 5: /lib64/libstdc++.so.6(+0x9653c) [0x7f4c1ac2553c] 2022-08-05T14:07:40.361 INFO:tasks.rgw.client.0.smithi120.stdout: 6: /lib64/libstdc++.so.6(+0x96597) [0x7f4c1ac25597] 2022-08-05T14:07:40.361 INFO:tasks.rgw.client.0.smithi120.stdout: 7: /lib64/libstdc++.so.6(+0x967f8) [0x7f4c1ac257f8] 2022-08-05T14:07:40.361 INFO:tasks.rgw.client.0.smithi120.stdout: 8: (std::__throw_bad_variant_access(bool)+0) [0x7f4c203a6020] 2022-08-05T14:07:40.361 INFO:tasks.rgw.client.0.smithi120.stdout: 9: (void boost::throw_exception<boost::gregorian::bad_day_of_month>(boost::gregorian::bad_day_of_month const&)+0) [0x7f4c203a6044] 2022-08-05T14:07:40.362 INFO:tasks.rgw.client.0.smithi120.stdout: 10: /lib64/libradosgw.so.2(+0x5c56bd) [0x7f4c203f76bd] 2022-08-05T14:07:40.362 INFO:tasks.rgw.client.0.smithi120.stdout: 11: (radosgw_Main(int, char const**)+0xa53) [0x7f4c20623e93] 2022-08-05T14:07:40.362 INFO:tasks.rgw.client.0.smithi120.stdout: 12: __libc_start_main() 2022-08-05T14:07:40.362 INFO:tasks.rgw.client.0.smithi120.stdout: 13: _start()
i ran against main the same day in https://pulpito.ceph.com/cbodley-2022-08-05_16:02:54-rgw-main-distro-default-smithi/ but that one only shows the ubuntu crashes
from shaman build logs,
ubuntu: The CXX compiler identification is GNU 11.1.0 and -DWITH_STATIC_LIBSTDCXX=ON
centos: The CXX compiler identification is GNU 11.2.1
Updated by Casey Bodley over 1 year ago
spun up a focal vm and tested with- and without WITH_STATIC_LIBSTDCXX. radosgw only crashes when it's ON. radosgw-admin does not crash in either configuration
Updated by Casey Bodley over 1 year ago
i have a feeling this is related to funky linkage changes from https://github.com/ceph/ceph/pull/32404. i'll try to build a partial revert for comparison
Updated by Casey Bodley over 1 year ago
- Status changed from New to Fix Under Review
- Assignee set to Casey Bodley
- Pull request ID set to 47504
Updated by Casey Bodley over 1 year ago
- Status changed from Fix Under Review to Resolved
Updated by Laura Flores 9 months ago
- Related to Bug #57206: ceph_test_libcephfs_reclaim crashes during test added
Updated by Laura Flores 9 months ago
- Related to Bug #62228: "Segmentation fault" (['libcephfs/test.sh']) in smoke on reef added