Bug #53359
bluestore: missing block.db symlinks leads to confusing crash
0%
Description
A regression in ceph-volume (master branch) led to the block.db symlink not getting created. This leads to OSDs that crash like so:
"backtrace": [ "/lib64/libpthread.so.0(+0x12c20) [0x7f3573347c20]", "gsignal()", "abort()", "/lib64/libstdc++.so.6(+0x9009b) [0x7f357295e09b]", "/lib64/libstdc++.so.6(+0x9653c) [0x7f357296453c]", "/lib64/libstdc++.so.6(+0x96597) [0x7f3572964597]", "/lib64/libstdc++.so.6(+0x967f8) [0x7f35729647f8]", "/usr/bin/ceph-osd(+0x5c7203) [0x55cc53713203]", "(BlueFS::_open_super()+0x18f) [0x55cc53e66cff]", "(BlueFS::mount()+0xeb) [0x55cc53e88ddb]", "(BlueStore::_open_bluefs(bool, bool)+0x94) [0x55cc53d4bad4]", "(BlueStore::_prepare_db_environment(bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0x6d9) [0x55cc53d4cc29]", "(BlueStore::_open_db(bool, bool, bool)+0x15c) [0x55cc53d4df4c]", "(BlueStore::_open_db_and_around(bool, bool)+0x2b4) [0x55cc53dc68d4]", "(BlueStore::_mount()+0x1ae) [0x55cc53dc971e]", "(OSD::init()+0x3ba) [0x55cc5385711a]", "main()", "__libc_start_main()", "_start()" ], "ceph_version": "17.0.0-9073-g6e528ed7",
The on-disk block that we are trying to decode is all zeros.
I thought we had a flag somewhere indicating whether a db and/or wal was expected so that we could provide a meaningful/informative error message, but maybe not?
(ceph-volume fix is here: https://github.com/ceph/ceph/pull/44030)
History
#1 Updated by Igor Fedotov over 2 years ago
I thought we had a flag somewhere indicating whether a db and/or wal was expected so that we could provide a meaningful/informative error message, but maybe not?
BlueFS superblock keeps disk layout in the recent releases, see memorized_layout member below:
struct bluefs_super_t {
uuid_d uuid; ///< unique to this bluefs instance
uuid_d osd_uuid; ///< matches the osd that owns us
uint64_t version;
uint32_t block_size;
bluefs_fnode_t log_fnode;
std::optional<bluefs_layout_t> memorized_layout;
But this superblock is kept at DB volume which is lost in your case...
#2 Updated by Sage Weil over 2 years ago
fyi here is a better stack trace
(gdb) bt #0 0x00007ffff4be937f in raise () from /lib64/libc.so.6 #1 0x00007ffff4bd3db5 in abort () from /lib64/libc.so.6 #2 0x00007ffff55a109b in __gnu_cxx::__verbose_terminate_handler() [clone .cold.1] () from /lib64/libstdc++.so.6 #3 0x00007ffff55a753c in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6 #4 0x00007ffff55a7597 in std::terminate() () from /lib64/libstdc++.so.6 #5 0x00007ffff55a77f8 in __cxa_throw () from /lib64/libstdc++.so.6 #6 0x0000555555b1b203 in bluefs_fnode_t::_denc_finish (struct_v=<synthetic pointer>, struct_compat=<synthetic pointer>, start_pos=<synthetic pointer>, struct_len=<synthetic pointer>, p=...) at /usr/include/c++/8/bits/char_traits.h:287 #7 _denc_friend<bluefs_fnode_t, ceph::buffer::v15_2_0::ptr::iterator_impl<true> > (p=..., v=...) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/bluefs_types.h:87 #8 bluefs_fnode_t::decode (p=..., this=0x5555589221b3) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/bluefs_types.h:75 #9 denc_traits<bluefs_fnode_t, void>::decode (f=0, p=..., v=...) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/bluefs_types.h:131 #10 ceph::decode<bluefs_fnode_t, denc_traits<bluefs_fnode_t, void> > (p=..., o=...) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/include/denc.h:1737 #11 bluefs_super_t::decode (this=this@entry=0x555557b859c0, p=...) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/bluefs_types.cc:87 #12 0x000055555626ed0f in decode (p=..., c=...) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/bluefs_types.h:179 #13 BlueFS::_open_super (this=0x555557b85880) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/BlueFS.cc:954 #14 0x0000555556290deb in BlueFS::mount (this=0x555557b85880) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/BlueFS.cc:764 #15 0x0000555556153ad4 in BlueStore::_open_bluefs(bool, bool) () at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/BlueStore.cc:6094 #16 0x0000555556154c29 in BlueStore::_prepare_db_environment(bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) () at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/BlueStore.cc:6335 #17 0x0000555556155f4c in BlueStore::_open_db(bool, bool, bool) () at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/BlueStore.cc:6455 #18 0x00005555561ce8e4 in BlueStore::_open_db_and_around(bool, bool) () at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/BlueStore.cc:6188 #19 0x00005555561d172e in BlueStore::_mount() () at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/BlueStore.cc:7397 #20 0x0000555555c5f11a in OSD::init() () at /usr/include/c++/8/bits/unique_ptr.h:345 #21 0x0000555555b9dd79 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/ceph_osd.cc:710
#3 Updated by Neha Ojha over 2 years ago
- Assignee set to Adam Kupczyk
- Priority changed from High to Normal
Adam will take a look.