Project

General

Profile

Actions

Bug #57050

closed

Crash on startup of radosgw in librbd::rbd_features_from_string()

Added by Daniel Gryniewicz over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Mosts tests are crashing in this teuthology run on a branch based on recent main:

https://pulpito.ceph.com/dang-2022-08-05_09:33:20-rgw-wip-dang-zipper-cleanup-distro-default-smithi/

The crash is in early startup, so there's no logs. There is a core file, with the following backtrace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f074b02c3ce in std::locale::operator==(std::locale const&) const () from /usr/lib/ceph/libceph-common.so.2
(gdb) bt
#0 0x00007f074b02c3ce in std::locale::operator==(std::locale const&) const () from /usr/lib/ceph/libceph-common.so.2
#1 0x00007f074aeed778 in boost::detail::lcast_ret_unsigned<std::char_traits<char>, unsigned long, char>::convert() () from /usr/lib/ceph/libceph-common.so.2
#2 0x00007f074aeec9c4 in librbd::rbd_features_from_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*) () from /usr/lib/ceph/libceph-common.so.2
#3 0x00007f074ab767df in ?? () from /usr/lib/ceph/libceph-common.so.2
#4 0x00007f074aad2b06 in Option::pre_validate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) const () from /usr/lib/ceph/libceph-common.so.2
#5 0x00007f074aad511b in Option::parse_value(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::variant<std::monostate, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long, long, double, bool, entity_addr_t, entity_addrvec_t, std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000l> >, Option::size_t, uuid_d>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) const () from /usr/lib/ceph/libceph-common.so.2
#6 0x00007f074aaa4912 in md_config_t::_set_val(ConfigValues&, ConfigTracker const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Option const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
) () from /usr/lib/ceph/libceph-common.so.2
#7 0x00007f074aaa4d87 in md_config_t::set_val_default(ConfigValues&, ConfigTracker const&, std::basic_string_view<char, std::char_traits<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
from /usr/lib/ceph/libceph-common.so.2
#8 0x00007f074aab484a in md_config_t::md_config_t(ConfigValues&, ConfigTracker const&, bool) () from /usr/lib/ceph/libceph-common.so.2
#9 0x00007f074aa5b4cc in ceph::common::CephContext::CephContext(unsigned int, ceph::common::CephContext::create_options const&) () from /usr/lib/ceph/libceph-common.so.2
#10 0x00007f074aa5c551 in ceph::common::CephContext::CephContext(unsigned int, code_environment_t, int) () from /usr/lib/ceph/libceph-common.so.2
#11 0x00007f074aa9aff5 in common_preinit(CephInitParameters const&, code_environment_t, int) () from /usr/lib/ceph/libceph-common.so.2
#12 0x00007f074cb55228 in global_pre_init (defaults=defaults@entry=0x7fffa2618190, args=std::vector of length 9, capacity 13 = {...}, module_type=module_type@entry=8, code_env=code_env@entry=CODE_ENVIRONMENT_DAEMON, flags=flags@entry=17) at ./src/global/global_init.cc:114
#13 0x00007f074c40cda5 in rgw_global_init (defaults=defaults@entry=0x7fffa2618190, args=std::vector of length 9, capacity 13 = {...}, module_type=module_type@entry=8, code_env=code_env@entry=CODE_ENVIRONMENT_DAEMON, flags=flags@entry=17) at ./src/rgw/rgw_common.cc:3037
#14 0x00007f074c2eccfa in radosgw_Main (argc=14, argv=0x7fffa2618968) at ./src/rgw/rgw_main.cc:250
#15 0x00007f074b71a083 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x0000000000000000 in ?? ()


Related issues 2 (0 open2 closed)

Related to CephFS - Bug #57206: ceph_test_libcephfs_reclaim crashes during testResolvedVenky Shankar

Actions
Related to CephFS - Bug #62228: "Segmentation fault" (['libcephfs/test.sh']) in smoke on reefResolved

Actions
Actions #1

Updated by Casey Bodley over 1 year ago

very strange, seems to be crashing on

features = boost::lexical_cast<uint64_t>(value);

we recently updated both the boost version and the c++ version

if rbd isn't seeing behavior like this, we may need to dig into rgw's linkage - maybe it's somehow different than libceph-common.so

Actions #2

Updated by Casey Bodley over 1 year ago

  • Project changed from rbd to rgw

Dan's results in https://pulpito.ceph.com/dang-2022-08-05_09:33:20-rgw-wip-dang-zipper-cleanup-distro-default-smithi/ show startup crashes under both ubuntu and centos, but the centos ones look different:

2022-08-05T14:07:40.358 INFO:tasks.rgw.client.0.smithi120.stdout:terminate called after throwing an instance of 'std::bad_variant_access'
2022-08-05T14:07:40.358 INFO:tasks.rgw.client.0.smithi120.stdout:  what():  std::get: wrong index for variant
2022-08-05T14:07:40.358 INFO:tasks.rgw.client.0.smithi120.stdout:*** Caught signal (Aborted) **
2022-08-05T14:07:40.359 INFO:tasks.rgw.client.0.smithi120.stdout: in thread 7f4c1583e600 thread_name:radosgw
2022-08-05T14:07:40.359 INFO:tasks.rgw.client.0.smithi120.stdout: ceph version 17.0.0-14041-g1b25ebe8 (1b25ebe881f4e3cb2c720c7d9794e3c6072a600a) quincy (dev)
2022-08-05T14:07:40.360 INFO:tasks.rgw.client.0.smithi120.stdout: 1: /lib64/libpthread.so.0(+0x12ce0) [0x7f4c1ddd5ce0]
2022-08-05T14:07:40.360 INFO:tasks.rgw.client.0.smithi120.stdout: 2: gsignal()
2022-08-05T14:07:40.360 INFO:tasks.rgw.client.0.smithi120.stdout: 3: abort()
2022-08-05T14:07:40.360 INFO:tasks.rgw.client.0.smithi120.stdout: 4: /lib64/libstdc++.so.6(+0x9009b) [0x7f4c1ac1f09b]
2022-08-05T14:07:40.361 INFO:tasks.rgw.client.0.smithi120.stdout: 5: /lib64/libstdc++.so.6(+0x9653c) [0x7f4c1ac2553c]
2022-08-05T14:07:40.361 INFO:tasks.rgw.client.0.smithi120.stdout: 6: /lib64/libstdc++.so.6(+0x96597) [0x7f4c1ac25597]
2022-08-05T14:07:40.361 INFO:tasks.rgw.client.0.smithi120.stdout: 7: /lib64/libstdc++.so.6(+0x967f8) [0x7f4c1ac257f8]
2022-08-05T14:07:40.361 INFO:tasks.rgw.client.0.smithi120.stdout: 8: (std::__throw_bad_variant_access(bool)+0) [0x7f4c203a6020]
2022-08-05T14:07:40.361 INFO:tasks.rgw.client.0.smithi120.stdout: 9: (void boost::throw_exception<boost::gregorian::bad_day_of_month>(boost::gregorian::bad_day_of_month const&)+0) [0x7f4c203a6044]
2022-08-05T14:07:40.362 INFO:tasks.rgw.client.0.smithi120.stdout: 10: /lib64/libradosgw.so.2(+0x5c56bd) [0x7f4c203f76bd]
2022-08-05T14:07:40.362 INFO:tasks.rgw.client.0.smithi120.stdout: 11: (radosgw_Main(int, char const**)+0xa53) [0x7f4c20623e93]
2022-08-05T14:07:40.362 INFO:tasks.rgw.client.0.smithi120.stdout: 12: __libc_start_main()
2022-08-05T14:07:40.362 INFO:tasks.rgw.client.0.smithi120.stdout: 13: _start()

i ran against main the same day in https://pulpito.ceph.com/cbodley-2022-08-05_16:02:54-rgw-main-distro-default-smithi/ but that one only shows the ubuntu crashes

from shaman build logs,
ubuntu: The CXX compiler identification is GNU 11.1.0 and -DWITH_STATIC_LIBSTDCXX=ON
centos: The CXX compiler identification is GNU 11.2.1

Actions #3

Updated by Casey Bodley over 1 year ago

spun up a focal vm and tested with- and without WITH_STATIC_LIBSTDCXX. radosgw only crashes when it's ON. radosgw-admin does not crash in either configuration

Actions #4

Updated by Casey Bodley over 1 year ago

i have a feeling this is related to funky linkage changes from https://github.com/ceph/ceph/pull/32404. i'll try to build a partial revert for comparison

Actions #5

Updated by Casey Bodley over 1 year ago

  • Status changed from New to Fix Under Review
  • Assignee set to Casey Bodley
  • Pull request ID set to 47504
Actions #6

Updated by Casey Bodley over 1 year ago

  • Status changed from Fix Under Review to Resolved
Actions #7

Updated by Laura Flores 9 months ago

  • Related to Bug #57206: ceph_test_libcephfs_reclaim crashes during test added
Actions #8

Updated by Laura Flores 9 months ago

  • Related to Bug #62228: "Segmentation fault" (['libcephfs/test.sh']) in smoke on reef added
Actions

Also available in: Atom PDF