Project

General

Profile

Bug #53359

bluestore: missing block.db symlinks leads to confusing crash

Added by Sage Weil over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A regression in ceph-volume (master branch) led to the block.db symlink not getting created. This leads to OSDs that crash like so:

    "backtrace": [
        "/lib64/libpthread.so.0(+0x12c20) [0x7f3573347c20]",
        "gsignal()",
        "abort()",
        "/lib64/libstdc++.so.6(+0x9009b) [0x7f357295e09b]",
        "/lib64/libstdc++.so.6(+0x9653c) [0x7f357296453c]",
        "/lib64/libstdc++.so.6(+0x96597) [0x7f3572964597]",
        "/lib64/libstdc++.so.6(+0x967f8) [0x7f35729647f8]",
        "/usr/bin/ceph-osd(+0x5c7203) [0x55cc53713203]",
        "(BlueFS::_open_super()+0x18f) [0x55cc53e66cff]",
        "(BlueFS::mount()+0xeb) [0x55cc53e88ddb]",
        "(BlueStore::_open_bluefs(bool, bool)+0x94) [0x55cc53d4bad4]",
        "(BlueStore::_prepare_db_environment(bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0x6d9) [0x55cc53d4cc29]",
        "(BlueStore::_open_db(bool, bool, bool)+0x15c) [0x55cc53d4df4c]",
        "(BlueStore::_open_db_and_around(bool, bool)+0x2b4) [0x55cc53dc68d4]",
        "(BlueStore::_mount()+0x1ae) [0x55cc53dc971e]",
        "(OSD::init()+0x3ba) [0x55cc5385711a]",
        "main()",
        "__libc_start_main()",
        "_start()" 
    ],
    "ceph_version": "17.0.0-9073-g6e528ed7",

The on-disk block that we are trying to decode is all zeros.

I thought we had a flag somewhere indicating whether a db and/or wal was expected so that we could provide a meaningful/informative error message, but maybe not?

(ceph-volume fix is here: https://github.com/ceph/ceph/pull/44030)

History

#1 Updated by Igor Fedotov over 2 years ago

I thought we had a flag somewhere indicating whether a db and/or wal was expected so that we could provide a meaningful/informative error message, but maybe not?

BlueFS superblock keeps disk layout in the recent releases, see memorized_layout member below:
struct bluefs_super_t {
uuid_d uuid; ///< unique to this bluefs instance
uuid_d osd_uuid; ///< matches the osd that owns us
uint64_t version;
uint32_t block_size;

bluefs_fnode_t log_fnode;
std::optional&lt;bluefs_layout_t&gt; memorized_layout;

But this superblock is kept at DB volume which is lost in your case...

#2 Updated by Sage Weil over 2 years ago

fyi here is a better stack trace

(gdb) bt
#0  0x00007ffff4be937f in raise () from /lib64/libc.so.6
#1  0x00007ffff4bd3db5 in abort () from /lib64/libc.so.6
#2  0x00007ffff55a109b in __gnu_cxx::__verbose_terminate_handler() [clone .cold.1] () from /lib64/libstdc++.so.6
#3  0x00007ffff55a753c in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6
#4  0x00007ffff55a7597 in std::terminate() () from /lib64/libstdc++.so.6
#5  0x00007ffff55a77f8 in __cxa_throw () from /lib64/libstdc++.so.6
#6  0x0000555555b1b203 in bluefs_fnode_t::_denc_finish (struct_v=<synthetic pointer>, struct_compat=<synthetic pointer>, start_pos=<synthetic pointer>, struct_len=<synthetic pointer>, p=...) at /usr/include/c++/8/bits/char_traits.h:287
#7  _denc_friend<bluefs_fnode_t, ceph::buffer::v15_2_0::ptr::iterator_impl<true> > (p=..., v=...) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/bluefs_types.h:87
#8  bluefs_fnode_t::decode (p=..., this=0x5555589221b3) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/bluefs_types.h:75
#9  denc_traits<bluefs_fnode_t, void>::decode (f=0, p=..., v=...) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/bluefs_types.h:131
#10 ceph::decode<bluefs_fnode_t, denc_traits<bluefs_fnode_t, void> > (p=..., o=...) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/include/denc.h:1737
#11 bluefs_super_t::decode (this=this@entry=0x555557b859c0, p=...) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/bluefs_types.cc:87
#12 0x000055555626ed0f in decode (p=..., c=...) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/bluefs_types.h:179
#13 BlueFS::_open_super (this=0x555557b85880) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/BlueFS.cc:954
#14 0x0000555556290deb in BlueFS::mount (this=0x555557b85880) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/BlueFS.cc:764
#15 0x0000555556153ad4 in BlueStore::_open_bluefs(bool, bool) () at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/BlueStore.cc:6094
#16 0x0000555556154c29 in BlueStore::_prepare_db_environment(bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)
    () at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/BlueStore.cc:6335
#17 0x0000555556155f4c in BlueStore::_open_db(bool, bool, bool) () at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/BlueStore.cc:6455
#18 0x00005555561ce8e4 in BlueStore::_open_db_and_around(bool, bool) () at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/BlueStore.cc:6188
#19 0x00005555561d172e in BlueStore::_mount() () at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/os/bluestore/BlueStore.cc:7397
#20 0x0000555555c5f11a in OSD::init() () at /usr/include/c++/8/bits/unique_ptr.h:345
#21 0x0000555555b9dd79 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/ceph-17.0.0-9082.g8cc3f605.el8.x86_64/src/ceph_osd.cc:710

#3 Updated by Neha Ojha over 2 years ago

  • Assignee set to Adam Kupczyk
  • Priority changed from High to Normal

Adam will take a look.

Also available in: Atom PDF