Bug #40434
closedceph-bluestore-tool:bluefs-bdev-migrate might result in broken OSD
0%
Description
This happens when migrating from DB to main device is initiated and some bluefs data is already at main device.
After that RocksDB/BlueFS are unable to locate some SST files during subsequent OSD loading. Looks like lack of db.slow directory rename.
Updated by Igor Fedotov over 4 years ago
- Status changed from New to 12
- Assignee set to Igor Fedotov
Updated by 玮文 胡 over 3 years ago
This is old, But I'm experiencing this today on the newest 15.2.6. Hopes this can be fixed.
I'm trying to remove the DB device, reformat my SSD with a different layout, then add the DB device back.
Here is the command:
ceph-bluestore-tool bluefs-bdev-migrate --dev-target /var/lib/ceph/osd/ceph-5/block --devs-source /var/lib/ceph/osd/ceph-5/block.db --path /var/lib/ceph/osd/ceph-5/
This finished with output: "device removed:1 /var/lib/ceph/osd/ceph-5/block.db". But OSD crash if I start it again, 'ceph-bluestore-tool fsck' crash, too.
Here is the output of "ceph crash info":
{ "backtrace": [ "(()+0x12dd0) [0x7fdbc7a6edd0]", "(gsignal()+0x10f) [0x7fdbc66d770f]", "(abort()+0x127) [0x7fdbc66c1b25]", "(()+0x9006b) [0x7fdbc708c06b]", "(()+0x9650c) [0x7fdbc709250c]", "(()+0x95529) [0x7fdbc7091529]", "(__gxx_personality_v0()+0x2a8) [0x7fdbc7091ea8]", "(()+0x10ad3) [0x7fdbc6a72ad3]", "(_Unwind_RaiseException()+0x2b1) [0x7fdbc6a73041]", "(__cxa_throw()+0x3b) [0x7fdbc70927bb]", "(()+0x91f7d) [0x7fdbc708df7d]", "(()+0x10c636f) [0x55dda6a2a36f]", "(rocksdb::Version::~Version()+0x104) [0x55dda6a39524]", "(rocksdb::Version::Unref()+0x21) [0x55dda6a395d1]", "(rocksdb::ColumnFamilyData::~ColumnFamilyData()+0x5a) [0x55dda6b0c07a]", "(rocksdb::ColumnFamilySet::~ColumnFamilySet()+0x88) [0x55dda6b0c918]", "(rocksdb::VersionSet::~VersionSet()+0x5e) [0x55dda6a2a3ce]", "(rocksdb::VersionSet::~VersionSet()+0x11) [0x55dda6a2a611]", "(rocksdb::DBImpl::CloseHelper()+0x616) [0x55dda6972276]", "(rocksdb::DBImpl::~DBImpl()+0x83b) [0x55dda69786fb]", "(rocksdb::DBImplReadOnly::~DBImplReadOnly()+0x11) [0x55dda69bff61]", "(rocksdb::DB::OpenForReadOnly(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool)+0x1089) [0x55dda69c1b89]", "(RocksDBStore::do_open(std::ostream&, bool, bool, std::vector<KeyValueDB::ColumnFamily, std::allocator<KeyValueDB::ColumnFamily> > const*)+0x14ca) [0x55dda694496a]", "(BlueStore::_open_db(bool, bool, bool)+0x1b34) [0x55dda63e5974]", "(BlueStore::_open_db_and_around(bool)+0x4c) [0x55dda63fbfac]", "(BlueStore::_mount(bool, bool)+0x847) [0x55dda644e787]", "(OSD::init()+0x380) [0x55dda5f77dd0]", "(main()+0x47e4) [0x55dda5ecb1b4]", "(__libc_start_main()+0xf3) [0x7fdbc66c36a3]", "(_start()+0x2e) [0x55dda5ef939e]" ], "ceph_version": "15.2.6", "crash_id": "2020-11-30T12:49:55.701180Z_5aa6d09f-b7a5-4894-ac7a-361e7cda91f3", "entity_name": "osd.5", "os_id": "centos", "os_name": "CentOS Linux", "os_version": "8 (Core)", "os_version_id": "8", "process_name": "ceph-osd", "stack_sig": "59778be7b0a1fd13dcd305d95e143e5d68f89ad6cf2158f67720d92ca25e6a8c", "timestamp": "2020-11-30T12:49:55.701180Z", "utsname_hostname": "********", "utsname_machine": "x86_64", "utsname_release": "5.4.0-54-generic", "utsname_sysname": "Linux", "utsname_version": "#60-Ubuntu SMP Fri Nov 6 10:37:59 UTC 2020" }
"ceph-volume lvm activate" bring the old DB device back, and allow the OSD to boot.
From the above description, I think you can already reproduce this. But I can provide more details if needed.
Given this does not work. How can I change the layout of the SSD used as the DB device, without another spare disk to migrate to?
Updated by Igor Fedotov over 2 years ago
- Status changed from New to Fix Under Review
- Backport set to pacific, octopus
- Pull request ID set to 42922
Updated by Kefu Chai over 2 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot over 2 years ago
- Copied to Backport #52517: octopus: ceph-bluestore-tool:bluefs-bdev-migrate might result in broken OSD added
Updated by Backport Bot over 2 years ago
- Copied to Backport #52518: pacific: ceph-bluestore-tool:bluefs-bdev-migrate might result in broken OSD added
Updated by Igor Fedotov over 2 years ago
- Pull request ID changed from 42922 to 42992
Updated by Igor Fedotov over 2 years ago
- Status changed from Pending Backport to Resolved
Updated by Igor Fedotov over 2 years ago
- Has duplicate Bug #52816: Block.db has been migrated with ceph-volume lvm migrate and osd never started back added