Project

General

Profile

Actions

Bug #55696

closed

vstart hangs on when creating volume

Added by Luis Henriques almost 2 years ago. Updated almost 2 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

As in the subject, when I create a vstart cluster it hangs with the following:

MON=2 MDS=3 OSD=2 RGW=0 ../src/vstart.sh -b --without-dashboard -i <IP-ADDR> -n
<...>
/build/bin/ceph -c /build/ceph.conf -k /build/keyring fs volume ls 

While stucked in this stage, here's what I get with a ceph -s:

  cluster:                                                                            
    id:     680fb9ba-b496-47a1-9729-b801648b4ab3                                      
    health: HEALTH_OK                                                                 

  services:                                                                           
    mon: 2 daemons, quorum a,b (age 2m)                                               
    mgr: no daemons active (since 69s)                                                
    osd: 2 osds: 2 up (since 103s), 2 in (since 2m)                                   

  data:                                                                                                                                                                      
    pools:   1 pools, 1 pgs                                                           
    objects: 0 objects, 0 B                                                           
    usage:   2.0 GiB used, 200 GiB / 202 GiB avail                                    
    pgs:     100.000% pgs unknown
             1 unknown

So, no MDS is shown here and the pgs don't look good (and that's why I'm selecting the 'OSD' category for the issue).

I'm attaching the logs collected from the 'out' directory.


Files

vstart-logs.tgz (306 KB) vstart-logs.tgz vstart logs Luis Henriques, 05/18/2022 03:14 PM
vstart.log (8.1 KB) vstart.log vstart stdout Luis Henriques, 05/18/2022 04:13 PM

Related issues 1 (0 open1 closed)

Is duplicate of cephsqlite - Bug #55304: libcephsqlite: crash when compiled with gcc12 cause of regex treating '-' as a range operatorResolved

Actions
Actions #1

Updated by Luis Henriques almost 2 years ago

Here's the full output when adding MGR=1 to the command line.

Actions #2

Updated by Neha Ojha almost 2 years ago

  • Project changed from Ceph to cephsqlite
  • Category deleted (OSD)
    -1> 2022-05-18T16:11:54.639+0100 7f3929297640  5 cephsqlite: FullPathname: (client.4126) 1: /.mgr:devicehealth/main.db
     0> 2022-05-18T16:11:54.647+0100 7f3929297640 -1 *** Caught signal (Aborted) **
 in thread 7f3929297640 thread_name:devicehealth

 ceph version 17.0.0-12281-g948dd1bb13ae (948dd1bb13ae9be413a540a014ead04883536604) quincy (dev)
 1: /home/miguel/dev/ceph/ceph/build/bin/ceph-mgr(+0x581ced) [0x5593a58baced]
 2: /lib64/libc.so.6(+0x567c0) [0x7f394bab67c0]
 3: /lib64/libc.so.6(+0xa955c) [0x7f394bb0955c]
 4: raise()
 5: abort()
 6: /lib64/libstdc++.so.6(+0xa9ab5) [0x7f394be42ab5]
 7: /lib64/libstdc++.so.6(+0xb4fac) [0x7f394be4dfac]
 8: /lib64/libstdc++.so.6(+0xb5017) [0x7f394be4e017]
 9: /lib64/libstdc++.so.6(+0xb5278) [0x7f394be4e278]
 10: (std::__throw_regex_error(std::regex_constants::error_type, char const*)+0x48) [0x5593a568f944]
 11: (bool std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_expression_term<false, false>(std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_BracketState&, std::__detail::_BracketMatcher<std::__cxx11::regex_traits<char>, false, false>&)+0x1df) [0x5593a56a75f5]
 12: (void std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_insert_bracket_matcher<false, false>(bool)+0x9c) [0x5593a56a9030]
 13: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_bracket_expression()+0x2f) [0x5593a56aa551]
 14: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom()+0x315) [0x5593a56aa8bf]
 15: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term()+0x25) [0x5593a56aa8f1]
 16: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x20) [0x5593a56aa926]
 17: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction()+0x22) [0x5593a56aa9e8]
 18: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom()+0x2a7) [0x5593a56aa851]
 19: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term()+0x25) [0x5593a56aa8f1]
 20: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x20) [0x5593a56aa926]
 21: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x3a) [0x5593a56aa940]
 22: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x3a) [0x5593a56aa940]
 23: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x3a) [0x5593a56aa940]
 24: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x3a) [0x5593a56aa940]
 25: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction()+0x22) [0x5593a56aa9e8]
 26: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type)+0x106) [0x5593a56aabbe]
 27: /home/miguel/dev/ceph/ceph/build/lib/libcephsqlite.so(+0x3364e) [0x7f394d8dc64e]
 28: /home/miguel/dev/ceph/ceph/build/lib/libcephsqlite.so(+0x3371c) [0x7f394d8dc71c]
 29: /home/miguel/dev/ceph/ceph/build/lib/libcephsqlite.so(+0x18052) [0x7f394d8c1052]
 30: /home/miguel/dev/ceph/ceph/build/lib/libcephsqlite.so(+0x18ce7) [0x7f394d8c1ce7]
 31: /lib64/libsqlite3.so.0(+0x2d49b) [0x7f394d76549b]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
<\pre>
Actions #3

Updated by Luis Henriques almost 2 years ago

FYI the sqlite3 version I've installed in this system is 3.38.5-1.1 (from openSUSE Tumbleweed).

Actions #4

Updated by Neha Ojha almost 2 years ago

MDS=0 MGR=1 OSD=3 MON=1 ../src/vstart.sh -n -x --without-dashboard on 8b396522d8f88c9cd58b82a0d35b943dc6744dce works fine

  cluster:
    id:     04c66153-65f4-4d02-970f-8db9661183f0
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum a (age 34s)
    mgr: x(active, since 28s)
    osd: 3 osds: 3 up (since 8s), 3 in (since 22s)

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   3.0 GiB used, 300 GiB / 303 GiB avail
    pgs:     1 active+clean

MON=2 MDS=3 OSD=2 RGW=0 MGR=1 ../src/vstart.sh -b --without-dashboard works fine too

  cluster:
    id:     04c66153-65f4-4d02-970f-8db9661183f0
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum a (age 12s)
    mgr: x(active, since 7s)
    osd: 3 osds: 3 up (since 4s), 3 in (since 2m)

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 577 KiB
    usage:   3.0 GiB used, 300 GiB / 303 GiB avail
    pgs:     1 active+clean

Actions #5

Updated by Neha Ojha almost 2 years ago

Actions #6

Updated by Luis Henriques almost 2 years ago

Thanks a lot for your help.

I've added some debug code and it looks like the crash is happening in file libcephsqlite.cc, function parsepath(). Basically, the following line is the responsible for the crash:

  static const std::regex re1{"^/*(\\*[[:digit:]]+):([[:alnum:]-_.]*)/([[:alnum:]-._]+)$"};

I think the latest upgrade I've done in this box also updated the compiler to gcc-12, so maybe there's some bug triggered by it (or maybe even in the libstdc...).

Actions #7

Updated by Luis Henriques almost 2 years ago

Ok, looks like it's a gcc12 issue. Here's a small program I've wrote:

#include <regex>
#include <iostream>

int main()
{
        try {
                std::regex re{"^/*(\\*[[:digit:]]+):([[:alnum:]-_.]*)/([[:alnum:]-._]+)$"};
        }
        catch (const std::regex_error& e) {
                std::cout << "regex_error caught: " << e.code() << " => " << e.what() << '\n';
        }
}

It works fine if I compile it with gcc11, but it triggers an exception with gcc12:

regex_error caught: 8 => Invalid start of '[x-x]' range in regular expression

I'll go back and try to compile ceph with gcc-11; hopefully that'll allow me to run vstart clusters again.

Actions #8

Updated by Luis Henriques almost 2 years ago

FWIW, I can't reproduce the issue when compiling with gcc-11. So, something changed (libstd?) that broke this regex.

Actions #9

Updated by Patrick Donnelly almost 2 years ago

  • Is duplicate of Bug #55304: libcephsqlite: crash when compiled with gcc12 cause of regex treating '-' as a range operator added
Actions #10

Updated by Patrick Donnelly almost 2 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF