Bug #57195: terminate called after throwing an instance of 'std::bad_variant_access' - rgw - Ceph

Actions

Copy link

Bug #57195

closed

terminate called after throwing an instance of 'std::bad_variant_access'

Added by Casey Bodley over 1 year ago. Updated over 1 year ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Casey Bodley

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

47907

Crash signature (v1):

Crash signature (v2):

Description

rgw crashes on startup in a lot of centos8 jobs: http://qa-proxy.ceph.com/teuthology/cbodley-2022-08-18_23:33:15-rgw-wip-cbodley-testing-distro-default-smithi/6979615/teuthology.log

terminate called after throwing an instance of 'std::bad_variant_access'
  what():  std::get: wrong index for variant
*** Caught signal (Aborted) **
 in thread ee24540 thread_name:memcheck-amd64-
 ceph version 17.0.0-14379-ga8b84acb (a8b84acb87be6574934d5b5cc860020487d73e7a) quincy (dev)
 1: /lib64/libpthread.so.0(+0x12ce0) [0x8de2ce0]
 2: gsignal()
 3: abort()
 4: /lib64/libstdc++.so.6(+0x9009b) [0x962209b]
 5: /lib64/libstdc++.so.6(+0x9653c) [0x962853c]
 6: /lib64/libstdc++.so.6(+0x96597) [0x9628597]
 7: /lib64/libstdc++.so.6(+0x967f8) [0x96287f8]
 8: (std::__throw_bad_variant_access(bool)+0) [0x610506]
 9: (void boost::throw_exception<boost::bad_function_call>(boost::bad_function_call const&)+0) [0x61052a]
 10: radosgw(+0x55bd19) [0x663d19]
 11: main()
 12: __libc_start_main()
 13: _start()

first saw on august 5th in https://tracker.ceph.com/issues/57050#note-2

frames 8 and 9 show two different exceptions being thrown. in that other tracker issue, the exceptions were:

 8: (std::__throw_bad_variant_access(bool)+0) [0x7f4c203a6020]
 9: (void boost::throw_exception<boost::gregorian::bad_day_of_month>(boost::gregorian::bad_day_of_month const&)+0) [0x7f4c203a6044

Actions

Copy link

Updated by Casey Bodley over 1 year ago

Assignee set to Casey Bodley

Actions

Copy link

Updated by Casey Bodley over 1 year ago

tried but was unable to reproduce in a centos stream 8 vm with the following cmake config:

cmake -GNinja -DCMAKE_BUILD_TYPE=Debug -DWITH_MGR=OFF -DWITH_CEPHFS=OFF -DWITH_KRBD=OFF -DWITH_RBD=OFF -DWITH_MGR_DASHBOARD_FRONTEND=OFF -DWITH_RDMA=OFF -DWITH_FUSE=OFF ..

Actions

Copy link

Updated by Casey Bodley over 1 year ago

this seems to only crash in our valgrind jobs, ex https://pulpito.ceph.com/amaredia-2022-08-30_18:13:58-rgw:verify-main-distro-default-smithi/

i'll try to reproduce manually under valgrind

Actions

Copy link

Updated by Casey Bodley over 1 year ago

Casey Bodley wrote:

this seems to only crash in our valgrind jobs, ex https://pulpito.ceph.com/amaredia-2022-08-30_18:13:58-rgw:verify-main-distro-default-smithi/

i'll try to reproduce manually under valgrind

scratch that, it's the rgw-datacache jobs that fail consistently, and the no-datacache ones that pass

Actions

Copy link

Updated by Casey Bodley over 1 year ago

Status changed from New to Fix Under Review
Pull request ID set to 47907

reproduced after configuring rgw d3n l1 local datacache enabled = true:


 8: (std::__throw_bad_variant_access(bool)+0) [0x55ccf24b07f2]
 9: (ceph::version_1_0::spin_lock(std::atomic_flag&)+0) [0x55ccf24b0813]
 10: (unsigned long const md_config_t::get_val<unsigned long>(ConfigValues const&, std::basic_string_view<char, std::char_traits<char> >) const+0x97) [0x55ccf2604a17]
 11: (StoreManager::get_config(bool, ceph::common::CephContext*)+0x291) [0x55ccf2bd1cbf]

this is a regression from https://github.com/ceph/ceph/pull/47362, which switched from using these legacy config variables:

    bool rgw_d3n_datacache_enabled =
        cct->_conf->rgw_d3n_l1_local_datacache_enabled;
    if (rgw_d3n_datacache_enabled &&
        (cct->_conf->rgw_max_chunk_size != cct->_conf->rgw_obj_stripe_size)) {

to lookups with get_val<T>():

    const auto& d3n = g_conf().get_val<bool>("rgw_d3n_l1_local_datacache_enabled");
    if (!admin && d3n) {
      if (g_conf().get_val<size_t>("rgw_max_chunk_size") !=
      g_conf().get_val<size_t>("rgw_obj_stripe_size")) {

Actions

Copy link

Updated by Casey Bodley over 1 year ago

Status changed from Fix Under Review to Resolved

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #57195

terminate called after throwing an instance of 'std::bad_variant_access'

Updated by Casey Bodley over 1 year ago

Updated by Casey Bodley over 1 year ago

Updated by Casey Bodley over 1 year ago

Updated by Casey Bodley over 1 year ago

Updated by Casey Bodley over 1 year ago

Updated by Casey Bodley over 1 year ago