Project

General

Profile

Bug #40434

ceph-bluestore-tool:bluefs-bdev-migrate might result in broken OSD

Added by Igor Fedotov about 2 years ago. Updated 10 days ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific, octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This happens when migrating from DB to main device is initiated and some bluefs data is already at main device.
After that RocksDB/BlueFS are unable to locate some SST files during subsequent OSD loading. Looks like lack of db.slow directory rename.


Related issues

Copied to bluestore - Backport #52517: octopus: ceph-bluestore-tool:bluefs-bdev-migrate might result in broken OSD In Progress
Copied to bluestore - Backport #52518: pacific: ceph-bluestore-tool:bluefs-bdev-migrate might result in broken OSD Resolved

History

#1 Updated by Igor Fedotov almost 2 years ago

  • Status changed from New to 12
  • Assignee set to Igor Fedotov

#2 Updated by Patrick Donnelly almost 2 years ago

  • Status changed from 12 to New

#3 Updated by 玮文 胡 10 months ago

This is old, But I'm experiencing this today on the newest 15.2.6. Hopes this can be fixed.

I'm trying to remove the DB device, reformat my SSD with a different layout, then add the DB device back.

Here is the command:

ceph-bluestore-tool bluefs-bdev-migrate --dev-target /var/lib/ceph/osd/ceph-5/block --devs-source /var/lib/ceph/osd/ceph-5/block.db --path /var/lib/ceph/osd/ceph-5/

This finished with output: "device removed:1 /var/lib/ceph/osd/ceph-5/block.db". But OSD crash if I start it again, 'ceph-bluestore-tool fsck' crash, too.

Here is the output of "ceph crash info":

{
    "backtrace": [
        "(()+0x12dd0) [0x7fdbc7a6edd0]",
        "(gsignal()+0x10f) [0x7fdbc66d770f]",
        "(abort()+0x127) [0x7fdbc66c1b25]",
        "(()+0x9006b) [0x7fdbc708c06b]",
        "(()+0x9650c) [0x7fdbc709250c]",
        "(()+0x95529) [0x7fdbc7091529]",
        "(__gxx_personality_v0()+0x2a8) [0x7fdbc7091ea8]",
        "(()+0x10ad3) [0x7fdbc6a72ad3]",
        "(_Unwind_RaiseException()+0x2b1) [0x7fdbc6a73041]",
        "(__cxa_throw()+0x3b) [0x7fdbc70927bb]",
        "(()+0x91f7d) [0x7fdbc708df7d]",
        "(()+0x10c636f) [0x55dda6a2a36f]",
        "(rocksdb::Version::~Version()+0x104) [0x55dda6a39524]",
        "(rocksdb::Version::Unref()+0x21) [0x55dda6a395d1]",
        "(rocksdb::ColumnFamilyData::~ColumnFamilyData()+0x5a) [0x55dda6b0c07a]",
        "(rocksdb::ColumnFamilySet::~ColumnFamilySet()+0x88) [0x55dda6b0c918]",
        "(rocksdb::VersionSet::~VersionSet()+0x5e) [0x55dda6a2a3ce]",
        "(rocksdb::VersionSet::~VersionSet()+0x11) [0x55dda6a2a611]",
        "(rocksdb::DBImpl::CloseHelper()+0x616) [0x55dda6972276]",
        "(rocksdb::DBImpl::~DBImpl()+0x83b) [0x55dda69786fb]",
        "(rocksdb::DBImplReadOnly::~DBImplReadOnly()+0x11) [0x55dda69bff61]",
        "(rocksdb::DB::OpenForReadOnly(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool)+0x1089) [0x55dda69c1b89]",
        "(RocksDBStore::do_open(std::ostream&, bool, bool, std::vector<KeyValueDB::ColumnFamily, std::allocator<KeyValueDB::ColumnFamily> > const*)+0x14ca) [0x55dda694496a]",
        "(BlueStore::_open_db(bool, bool, bool)+0x1b34) [0x55dda63e5974]",
        "(BlueStore::_open_db_and_around(bool)+0x4c) [0x55dda63fbfac]",
        "(BlueStore::_mount(bool, bool)+0x847) [0x55dda644e787]",
        "(OSD::init()+0x380) [0x55dda5f77dd0]",
        "(main()+0x47e4) [0x55dda5ecb1b4]",
        "(__libc_start_main()+0xf3) [0x7fdbc66c36a3]",
        "(_start()+0x2e) [0x55dda5ef939e]" 
    ],
    "ceph_version": "15.2.6",
    "crash_id": "2020-11-30T12:49:55.701180Z_5aa6d09f-b7a5-4894-ac7a-361e7cda91f3",
    "entity_name": "osd.5",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8 (Core)",
    "os_version_id": "8",
    "process_name": "ceph-osd",
    "stack_sig": "59778be7b0a1fd13dcd305d95e143e5d68f89ad6cf2158f67720d92ca25e6a8c",
    "timestamp": "2020-11-30T12:49:55.701180Z",
    "utsname_hostname": "********",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-54-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#60-Ubuntu SMP Fri Nov 6 10:37:59 UTC 2020" 
}

"ceph-volume lvm activate" bring the old DB device back, and allow the OSD to boot.

From the above description, I think you can already reproduce this. But I can provide more details if needed.

Given this does not work. How can I change the layout of the SSD used as the DB device, without another spare disk to migrate to?

#4 Updated by Igor Fedotov 18 days ago

  • Status changed from New to Fix Under Review
  • Backport set to pacific, octopus
  • Pull request ID set to 42922

#5 Updated by Kefu Chai 12 days ago

  • Status changed from Fix Under Review to Pending Backport

#6 Updated by Backport Bot 12 days ago

  • Copied to Backport #52517: octopus: ceph-bluestore-tool:bluefs-bdev-migrate might result in broken OSD added

#7 Updated by Backport Bot 12 days ago

  • Copied to Backport #52518: pacific: ceph-bluestore-tool:bluefs-bdev-migrate might result in broken OSD added

#8 Updated by Igor Fedotov 10 days ago

  • Pull request ID changed from 42922 to 42992

Also available in: Atom PDF