Bug #55696
closedvstart hangs on when creating volume
0%
Description
As in the subject, when I create a vstart cluster it hangs with the following:
MON=2 MDS=3 OSD=2 RGW=0 ../src/vstart.sh -b --without-dashboard -i <IP-ADDR> -n <...> /build/bin/ceph -c /build/ceph.conf -k /build/keyring fs volume ls
While stucked in this stage, here's what I get with a ceph -s
:
cluster: id: 680fb9ba-b496-47a1-9729-b801648b4ab3 health: HEALTH_OK services: mon: 2 daemons, quorum a,b (age 2m) mgr: no daemons active (since 69s) osd: 2 osds: 2 up (since 103s), 2 in (since 2m) data: pools: 1 pools, 1 pgs objects: 0 objects, 0 B usage: 2.0 GiB used, 200 GiB / 202 GiB avail pgs: 100.000% pgs unknown 1 unknown
So, no MDS is shown here and the pgs don't look good (and that's why I'm selecting the 'OSD' category for the issue).
I'm attaching the logs collected from the 'out' directory.
Files
Updated by Luis Henriques almost 2 years ago
- File vstart.log vstart.log added
Here's the full output when adding MGR=1 to the command line.
Updated by Neha Ojha almost 2 years ago
- Project changed from Ceph to cephsqlite
- Category deleted (
OSD)
-1> 2022-05-18T16:11:54.639+0100 7f3929297640 5 cephsqlite: FullPathname: (client.4126) 1: /.mgr:devicehealth/main.db 0> 2022-05-18T16:11:54.647+0100 7f3929297640 -1 *** Caught signal (Aborted) ** in thread 7f3929297640 thread_name:devicehealth ceph version 17.0.0-12281-g948dd1bb13ae (948dd1bb13ae9be413a540a014ead04883536604) quincy (dev) 1: /home/miguel/dev/ceph/ceph/build/bin/ceph-mgr(+0x581ced) [0x5593a58baced] 2: /lib64/libc.so.6(+0x567c0) [0x7f394bab67c0] 3: /lib64/libc.so.6(+0xa955c) [0x7f394bb0955c] 4: raise() 5: abort() 6: /lib64/libstdc++.so.6(+0xa9ab5) [0x7f394be42ab5] 7: /lib64/libstdc++.so.6(+0xb4fac) [0x7f394be4dfac] 8: /lib64/libstdc++.so.6(+0xb5017) [0x7f394be4e017] 9: /lib64/libstdc++.so.6(+0xb5278) [0x7f394be4e278] 10: (std::__throw_regex_error(std::regex_constants::error_type, char const*)+0x48) [0x5593a568f944] 11: (bool std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_expression_term<false, false>(std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_BracketState&, std::__detail::_BracketMatcher<std::__cxx11::regex_traits<char>, false, false>&)+0x1df) [0x5593a56a75f5] 12: (void std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_insert_bracket_matcher<false, false>(bool)+0x9c) [0x5593a56a9030] 13: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_bracket_expression()+0x2f) [0x5593a56aa551] 14: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom()+0x315) [0x5593a56aa8bf] 15: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term()+0x25) [0x5593a56aa8f1] 16: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x20) [0x5593a56aa926] 17: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction()+0x22) [0x5593a56aa9e8] 18: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom()+0x2a7) [0x5593a56aa851] 19: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term()+0x25) [0x5593a56aa8f1] 20: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x20) [0x5593a56aa926] 21: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x3a) [0x5593a56aa940] 22: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x3a) [0x5593a56aa940] 23: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x3a) [0x5593a56aa940] 24: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()+0x3a) [0x5593a56aa940] 25: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction()+0x22) [0x5593a56aa9e8] 26: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type)+0x106) [0x5593a56aabbe] 27: /home/miguel/dev/ceph/ceph/build/lib/libcephsqlite.so(+0x3364e) [0x7f394d8dc64e] 28: /home/miguel/dev/ceph/ceph/build/lib/libcephsqlite.so(+0x3371c) [0x7f394d8dc71c] 29: /home/miguel/dev/ceph/ceph/build/lib/libcephsqlite.so(+0x18052) [0x7f394d8c1052] 30: /home/miguel/dev/ceph/ceph/build/lib/libcephsqlite.so(+0x18ce7) [0x7f394d8c1ce7] 31: /lib64/libsqlite3.so.0(+0x2d49b) [0x7f394d76549b] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. <\pre>
Updated by Luis Henriques almost 2 years ago
FYI the sqlite3 version I've installed in this system is 3.38.5-1.1 (from openSUSE Tumbleweed).
Updated by Neha Ojha almost 2 years ago
MDS=0 MGR=1 OSD=3 MON=1 ../src/vstart.sh -n -x --without-dashboard on 8b396522d8f88c9cd58b82a0d35b943dc6744dce works fine
cluster: id: 04c66153-65f4-4d02-970f-8db9661183f0 health: HEALTH_OK services: mon: 1 daemons, quorum a (age 34s) mgr: x(active, since 28s) osd: 3 osds: 3 up (since 8s), 3 in (since 22s) data: pools: 1 pools, 1 pgs objects: 2 objects, 449 KiB usage: 3.0 GiB used, 300 GiB / 303 GiB avail pgs: 1 active+clean
MON=2 MDS=3 OSD=2 RGW=0 MGR=1 ../src/vstart.sh -b --without-dashboard works fine too
cluster: id: 04c66153-65f4-4d02-970f-8db9661183f0 health: HEALTH_OK services: mon: 1 daemons, quorum a (age 12s) mgr: x(active, since 7s) osd: 3 osds: 3 up (since 4s), 3 in (since 2m) data: pools: 1 pools, 1 pgs objects: 2 objects, 577 KiB usage: 3.0 GiB used, 300 GiB / 303 GiB avail pgs: 1 active+clean
Updated by Neha Ojha almost 2 years ago
Radek was unable to reproduce it as well https://gist.github.com/rzarzynski/c1197842aed9f0bfae63fa77bc1afd11
Updated by Luis Henriques almost 2 years ago
Thanks a lot for your help.
I've added some debug code and it looks like the crash is happening in file libcephsqlite.cc, function parsepath(). Basically, the following line is the responsible for the crash:
static const std::regex re1{"^/*(\\*[[:digit:]]+):([[:alnum:]-_.]*)/([[:alnum:]-._]+)$"};
I think the latest upgrade I've done in this box also updated the compiler to gcc-12, so maybe there's some bug triggered by it (or maybe even in the libstdc...).
Updated by Luis Henriques almost 2 years ago
Ok, looks like it's a gcc12 issue. Here's a small program I've wrote:
#include <regex>
#include <iostream>
int main()
{
try {
std::regex re{"^/*(\\*[[:digit:]]+):([[:alnum:]-_.]*)/([[:alnum:]-._]+)$"};
}
catch (const std::regex_error& e) {
std::cout << "regex_error caught: " << e.code() << " => " << e.what() << '\n';
}
}
It works fine if I compile it with gcc11, but it triggers an exception with gcc12:
regex_error caught: 8 => Invalid start of '[x-x]' range in regular expression
I'll go back and try to compile ceph with gcc-11; hopefully that'll allow me to run vstart clusters again.
Updated by Luis Henriques almost 2 years ago
FWIW, I can't reproduce the issue when compiling with gcc-11. So, something changed (libstd?) that broke this regex.
Updated by Patrick Donnelly almost 2 years ago
- Is duplicate of Bug #55304: libcephsqlite: crash when compiled with gcc12 cause of regex treating '-' as a range operator added
Updated by Patrick Donnelly almost 2 years ago
- Status changed from New to Duplicate