Project

General

Profile

Bug #55696

vstart hangs on when creating volume

Added by Luis Henriques 7 months ago. Updated 7 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

As in the subject, when I create a vstart cluster it hangs with the following:

MON=2 MDS=3 OSD=2 RGW=0 ../src/vstart.sh -b --without-dashboard -i <IP-ADDR> -n
<...>
/build/bin/ceph -c /build/ceph.conf -k /build/keyring fs volume ls 

While stucked in this stage, here's what I get with a ceph -s:

  cluster:                                                                            
    id:     680fb9ba-b496-47a1-9729-b801648b4ab3                                      
    health: HEALTH_OK                                                                 

  services:                                                                           
    mon: 2 daemons, quorum a,b (age 2m)                                               
    mgr: no daemons active (since 69s)                                                
    osd: 2 osds: 2 up (since 103s), 2 in (since 2m)                                   

  data:                                                                                                                                                                      
    pools:   1 pools, 1 pgs                                                           
    objects: 0 objects, 0 B                                                           
    usage:   2.0 GiB used, 200 GiB / 202 GiB avail                                    
    pgs:     100.000% pgs unknown
             1 unknown

So, no MDS is shown here and the pgs don't look good (and that's why I'm selecting the 'OSD' category for the issue).

I'm attaching the logs collected from the 'out' directory.

vstart-logs.tgz - vstart logs (306 KB) Luis Henriques, 05/18/2022 03:14 PM

vstart.log View - vstart stdout (8.1 KB) Luis Henriques, 05/18/2022 04:13 PM


Related issues

Duplicates cephsqlite - Bug #55304: libcephsqlite: crash when compiled with gcc12 cause of regex treating '-' as a range operator Resolved

History

#1 Updated by Luis Henriques 7 months ago

Here's the full output when adding MGR=1 to the command line.

#2 Updated by Neha Ojha 7 months ago

  • Project changed from Ceph to cephsqlite
  • Category deleted (OSD)
    -1> 2022-05-18T16:11:54.639+0100 7f3929297640  5 cephsqlite: FullPathname: (client.4126) 1: /.mgr:devicehealth/main.db
     0> 2022-05-18T16:11:54.647+0100 7f3929297640 -1 *** Caught signal (Aborted) **
 in thread 7f3929297640 thread_name:devicehealth

 ceph version 17.0.0-12281-g948dd1bb13ae (948dd1bb13ae9be413a540a014ead04883536604) quincy (dev)
 1: /home/miguel/dev/ceph/ceph/build/bin/ceph-mgr(+0x581ced) [0x5593a58baced]
 2: /lib64/libc.so.6(+0x567c0) [0x7f394bab67c0]
 3: /lib64/libc.so.6(+0xa955c) [0x7f394bb0955c]
 4: raise()
 5: abort()
 6: /lib64/libstdc++.so.6(+0xa9ab5) [0x7f394be42ab5]
 7: /lib64/libstdc++.so.6(+0xb4fac) [0x7f394be4dfac]
 8: /lib64/libstdc++.so.6(+0xb5017) [0x7f394be4e017]
 9: /lib64/libstdc++.so.6(+0xb5278) [0x7f394be4e278]
 10: (std::__throw_regex_error(std::regex_constants::error_type, char const*)+0x48) [0x5593a568f944]
 11: (bool std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_expression_term<false, false>(std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_BracketState&, std::__detail::_BracketMatcher<std::__cxx11::regex_traits<char>, false, false>&)+0x1df) [0x5593a56a75f5]
 12: (void std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_insert_bracket_matcher<false, false>(bool)+0x9c) [0x5593a56a9030]
 13: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_bracket_expression()+0x2f) [0x5593a56aa551]
 14: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom()+0x315) [0x5593a56aa8bf]
 15: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term()+0x25) [0x5593a56aa8f1]
 16: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x20) [0x5593a56aa926]
 17: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction()+0x22) [0x5593a56aa9e8]
 18: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom()+0x2a7) [0x5593a56aa851]
 19: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term()+0x25) [0x5593a56aa8f1]
 20: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x20) [0x5593a56aa926]
 21: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x3a) [0x5593a56aa940]
 22: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x3a) [0x5593a56aa940]
 23: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x3a) [0x5593a56aa940]
 24: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x3a) [0x5593a56aa940]
 25: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction()+0x22) [0x5593a56aa9e8]
 26: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type)+0x106) [0x5593a56aabbe]
 27: /home/miguel/dev/ceph/ceph/build/lib/libcephsqlite.so(+0x3364e) [0x7f394d8dc64e]
 28: /home/miguel/dev/ceph/ceph/build/lib/libcephsqlite.so(+0x3371c) [0x7f394d8dc71c]
 29: /home/miguel/dev/ceph/ceph/build/lib/libcephsqlite.so(+0x18052) [0x7f394d8c1052]
 30: /home/miguel/dev/ceph/ceph/build/lib/libcephsqlite.so(+0x18ce7) [0x7f394d8c1ce7]
 31: /lib64/libsqlite3.so.0(+0x2d49b) [0x7f394d76549b]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
<\pre>

#3 Updated by Luis Henriques 7 months ago

FYI the sqlite3 version I've installed in this system is 3.38.5-1.1 (from openSUSE Tumbleweed).

#4 Updated by Neha Ojha 7 months ago

MDS=0 MGR=1 OSD=3 MON=1 ../src/vstart.sh -n -x --without-dashboard on 8b396522d8f88c9cd58b82a0d35b943dc6744dce works fine

  cluster:
    id:     04c66153-65f4-4d02-970f-8db9661183f0
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum a (age 34s)
    mgr: x(active, since 28s)
    osd: 3 osds: 3 up (since 8s), 3 in (since 22s)

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   3.0 GiB used, 300 GiB / 303 GiB avail
    pgs:     1 active+clean

MON=2 MDS=3 OSD=2 RGW=0 MGR=1 ../src/vstart.sh -b --without-dashboard works fine too

  cluster:
    id:     04c66153-65f4-4d02-970f-8db9661183f0
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum a (age 12s)
    mgr: x(active, since 7s)
    osd: 3 osds: 3 up (since 4s), 3 in (since 2m)

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 577 KiB
    usage:   3.0 GiB used, 300 GiB / 303 GiB avail
    pgs:     1 active+clean

#5 Updated by Neha Ojha 7 months ago

#6 Updated by Luis Henriques 7 months ago

Thanks a lot for your help.

I've added some debug code and it looks like the crash is happening in file libcephsqlite.cc, function parsepath(). Basically, the following line is the responsible for the crash:

  static const std::regex re1{"^/*(\\*[[:digit:]]+):([[:alnum:]-_.]*)/([[:alnum:]-._]+)$"};

I think the latest upgrade I've done in this box also updated the compiler to gcc-12, so maybe there's some bug triggered by it (or maybe even in the libstdc...).

#7 Updated by Luis Henriques 7 months ago

Ok, looks like it's a gcc12 issue. Here's a small program I've wrote:

#include <regex>
#include <iostream>

int main()
{
        try {
                std::regex re{"^/*(\\*[[:digit:]]+):([[:alnum:]-_.]*)/([[:alnum:]-._]+)$"};
        }
        catch (const std::regex_error& e) {
                std::cout << "regex_error caught: " << e.code() << " => " << e.what() << '\n';
        }
}

It works fine if I compile it with gcc11, but it triggers an exception with gcc12:

regex_error caught: 8 => Invalid start of '[x-x]' range in regular expression

I'll go back and try to compile ceph with gcc-11; hopefully that'll allow me to run vstart clusters again.

#8 Updated by Luis Henriques 7 months ago

FWIW, I can't reproduce the issue when compiling with gcc-11. So, something changed (libstd?) that broke this regex.

#9 Updated by Patrick Donnelly 7 months ago

  • Duplicates Bug #55304: libcephsqlite: crash when compiled with gcc12 cause of regex treating '-' as a range operator added

#10 Updated by Patrick Donnelly 7 months ago

  • Status changed from New to Duplicate

Also available in: Atom PDF