Bug #47883: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0) - bluestore - Ceph

Actions

Copy link

Bug #47883

closed

bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)

Added by chunsong feng over 3 years ago. Updated almost 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Igor Fedotov

Target version:

Ceph - v16.0.0

% Done:

Source:

Community (dev)

Tags:

BlueStore

Backport:

octopus, nautilus

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

38438

Crash signature (v1):

Crash signature (v2):

Description

2020-10-17T18:02:17.658+0800 ffff7eaac980 1 bluefs _allocate failed to allocate 0x4d0000 on bdev 1, free 0x19c2108000; fallback to bdev 2
2020-10-17T18:02:17.668+0800 ffff7eaac980 1 bluefs _allocate unable to allocate 0x4d0000 on bdev 2, free 0xffffffffffffffff
2020-10-17T18:02:17.678+0800 ffff7eaac980 -1 /home/rpmbuild/BUILD/ceph-16.0.0-4421-gcfeea548ad/src/os/bluestore/BlueFS.cc: In function 'void BlueFS::_compact_log_async(std::unique_lock<std::mutex>&)' thread ffff7eaac980 time 2020-10-17T18:02:17.676262+0800
/home/rpmbuild/BUILD/ceph-16.0.0-4421-gcfeea548ad/src/os/bluestore/BlueFS.cc: 2198: FAILED ceph_assert(r == 0)

ceph version 16.0.0-4421-gcfeea548ad (971ac7aa5739e2dbc0df1b7c8b10b17653df1037) pacific (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x154) [0xaaaae34238e8]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xaaaae3423ad0]
 3: (BlueFS::_compact_log_async(std::unique_lock&lt;std::mutex&gt;&)+0x16d0) [0xaaaae3a42518]
 4: (BlueRocksWritableFile::Sync()+0x80) [0xaaaae3a538f0]
 5: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x2c) [0xaaaae3e8e734]
 6: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x3a0) [0xaaaae3f7c498]
 7: (rocksdb::WritableFileWriter::Sync(bool)+0x90) [0xaaaae3f7d8b8]
 8: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x274) [0xaaaae3e9f444]
 9: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x1ccc) [0xaaaae3ea611c]
 10: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x30) [0xaaaae3ea63c0]
 11: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr&lt;KeyValueDB::TransactionImpl&gt;)+0x94) [0xaaaae3e5081c]
 12: (RocksDBStore::submit_transaction_sync(std::shared_ptr&lt;KeyValueDB::TransactionImpl&gt;)+0x98) [0xaaaae3e510c8]
 13: (BlueStore::_kv_sync_thread()+0x22dc) [0xaaaae39a1b04]
 14: (BlueStore::KVSyncThread::entry()+0x10) [0xaaaae39c69e0]
 15: /lib64/libpthread.so.0(+0x77f8) [0xffffa32e77f8]
 16: /lib64/libc.so.6(+0xd0edc) [0xffffa2e40edc]

2020-10-17T18:02:17.678+0800 ffff7eaac980 -1 ** Caught signal (Aborted) *
in thread ffff7eaac980 thread_name:bstore_kv_sync

ceph version 16.0.0-4421-gcfeea548ad (971ac7aa5739e2dbc0df1b7c8b10b17653df1037) pacific (dev)
 1: _kernel_rt_sigreturn()
 2: gsignal()
 3: abort()
 4: (ceph::_ceph_assert_fail(char const*, char const*, int, char const*)+0x1a4) [0xaaaae3423938]
 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xaaaae3423ad0]
 6: (BlueFS::_compact_log_async(std::unique_lock&lt;std::mutex&gt;&)+0x16d0) [0xaaaae3a42518]
 7: (BlueRocksWritableFile::Sync()+0x80) [0xaaaae3a538f0]
 8: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x2c) [0xaaaae3e8e734]
 9: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x3a0) [0xaaaae3f7c498]
 10: (rocksdb::WritableFileWriter::Sync(bool)+0x90) [0xaaaae3f7d8b8]

Files

ceph-osd.34.rar (345 KB) ceph-osd.34.rar

chunsong feng, 10/19/2020 01:38 AM

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by chunsong feng over 3 years ago

The problem recurs within 10 minutes after a test is performed on more than 32 KB concurrent random writes.

Actions

Copy link

Updated by chunsong feng over 3 years ago

Test Allocator=stipid and hybrid respectively.
When creating an OSD, only data is used. Block-wal and block-db are not configured. The total data space is 200 GB. When the application fails, there is still about 100 GB free space. The application also fails to be applied for after the OSD is restarted. Is this a problem caused by fragmentation allocation?

Actions

Copy link

Updated by Igor Fedotov over 3 years ago

chunsong feng wrote:
Hi Feng,

you're using master Ceph branch, right?
Am I getting properly OSD is unable to restart after this failure with the same "failed to allocate" log output? If so could you please set
"debug bluestore" to 10 and collect that failing startup log.

And one more experiment if possible please:
1) Start your benchmark against clean OSDs
2) Monitor fragmentation for a couple of OSDs using "watch -n 5 ceph tell osd.N bluestore allocator fragmentation block"

Test Allocator=stipid and hybrid respectively.

Do you mean you tried both stupid and hybrid with the same results?

When creating an OSD, only data is used. Block-wal and block-db are not configured. The total data space is 200 GB. When the application fails, there is still about 100 GB free space. The application also fails to be applied for after the OSD is restarted. Is this a problem caused by fragmentation allocation?

Actions

Copy link

Updated by Igor Fedotov over 3 years ago

Project changed from Ceph to bluestore
Category deleted (~~Objecter~~)

Actions

Copy link

Updated by Igor Fedotov over 3 years ago

Also prior to redeploying the cluster/OSDs you might want to learn current fragmentation rating using:
ceph-bluestore-tool --path <path-to-osd> --allocator block free-score
ceph-bluestore-tool --path <path-to-osd> --allocator block free-dump

Actions

Copy link

Updated by chunsong feng over 3 years ago

Igor Fedotov wrote:

chunsong feng wrote:
Hi Feng,

you're using master Ceph branch, right?

right

Am I getting properly OSD is unable to restart after this failure with the same "failed to allocate" log output? If so could you please set
"debug bluestore" to 10 and collect that failing startup log.

OSD is unable to restart after this failure with the same "failed to allocate" log output,right

And one more experiment if possible please:
1) Start your benchmark against clean OSDs
2) Monitor fragmentation for a couple of OSDs using "watch -n 5 ceph tell osd.N bluestore allocator fragmentation block"

Test Allocator=stipid and hybrid respectively.

Do you mean you tried both stupid and hybrid with the same results?

stupid and hybrid is the same results. This issue does not occur in versions 15.2.5 and 14.2.11.

When creating an OSD, only data is used. Block-wal and block-db are not configured. The total data space is 200 GB. When the application fails, there is still about 100 GB free space. The application also fails to be applied for after the OSD is restarted. Is this a problem caused by fragmentation allocation?

Actions

Copy link

Updated by Neha Ojha over 3 years ago

Status changed from New to Need More Info

Still waiting for https://tracker.ceph.com/issues/47883#note-5

Actions

Copy link

Updated by Martin Verges over 3 years ago

We triggered the bug on a production cluster having a lot of small files and objects as their workload.
The affected host where updated from 14.2.9 to 14.2.12 and 14.2.15 both newer versions hit this bug but just on 1 out of 13 OSDs (1 out of 3 NVMe OSDs).

Solution or Workaround to solve the issue was a OSD destroy and recreation.

Please see Output of "free-score", "free-dump", as well as the crash log.
Unfortunately the output is around >700MB and even zipped >40Mb so I can't upload at the tracker. The logs are available at https://nextcloud.croit.io/s/WZt9QPrEJTkEZMk

Actions

Copy link

Updated by Igor Fedotov over 3 years ago

Status changed from Need More Info to Fix Under Review
Assignee set to Igor Fedotov
Backport set to octopus, nautilus
Pull request ID set to 38438

Actions

Copy link

#10

Updated by Igor Fedotov over 3 years ago

The issue is caused by a bug in avl/hybrid allocators. The workaround would be switching back to bitmap allocator.

Actions

Copy link

#11

Updated by Igor Fedotov over 3 years ago

Well, my previous comment is valid for Martin's issue. It's still unclear for me whether it applies to the original report due to the lack of info.

Actions

Copy link

#12

Updated by Martin Verges over 3 years ago

just as an additional Information, the allocator is already bitmap.

Actions

Copy link

#13

Updated by Igor Fedotov over 3 years ago

Martin Verges wrote:

just as an additional Information, the allocator is already bitmap.

Do you mean it WAS bitmap when the issue occurred or it has been switched to bitmap since then?

Actions

Copy link

#14

Updated by Kefu Chai over 3 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

#15

Updated by Martin Verges over 3 years ago

Igor Fedotov wrote:

Do you mean it WAS bitmap when the issue occurred or it has been switched to bitmap since then?

Yes it was and we/customer never changed it to something else.

Actions

Copy link

#16

Updated by Igor Fedotov over 3 years ago

The default allocator was changed from bitmap to hybrid starting v14.2.11.
Hence unless you had made any custom settings the allocator type was changed for you automatically during upgrade to v14.2.11+.

Does it makes sense or I'm missing something?

Actions

Copy link

#17

Updated by Martin Verges over 3 years ago

ok sorry, didn't know that. Yes then it has changed while installing the .12 release.

Actions

Copy link

#18

Updated by Igor Fedotov over 3 years ago

Copied to Backport #48478: octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0) added

Actions

Copy link

#19

Updated by Igor Fedotov over 3 years ago

Copied to Backport #48479: nautilus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0) added

Actions

Copy link

#20

Updated by Igor Fedotov about 3 years ago

Status changed from Pending Backport to Resolved

Actions

Copy link

#21

Updated by Jérôme Poulin almost 3 years ago

A similar problem can happen when upgrading from 14.2.2 to 16.2.4

An OSD will start then fail about an hour later with this stack trace.

{
    "archived": "2021-07-05 19:04:48.552795",
    "assert_condition": "r == 0",
    "assert_file": "/build/ceph-16.2.4/src/os/bluestore/BlueFS.cc",
    "assert_func": "int BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, uint64_t, uint64_t)",
    "assert_line": 2542,
    "assert_msg": "/build/ceph-16.2.4/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, uint64_t, uint64_t)' thread 7f6d0133df00 time 2021-07-05T13:43:06.442236-0400\n/build/ceph-16.2.4/src/os/bluestore/BlueFS.cc: 2542: FAILED ceph_assert(r == 0)\n",
    "assert_thread_name": "ceph-osd",
    "backtrace": [
        "/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7f6cff4a3890]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19c) [0x56209918eb6e]",
        "(ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x56209918ecf8]",
        "(BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, unsigned long, unsigned long)+0x1bac) [0x56209980d19c]",
        "(BlueFS::_fsync(BlueFS::FileWriter*, std::unique_lock<std::mutex>&)+0x9e) [0x56209980d89e]",
        "(BlueRocksWritableFile::Sync()+0x6c) [0x56209983826c]",
        "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x4e) [0x562099cefcd8]",
        "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x212) [0x562099ed9a6c]",
        "(rocksdb::WritableFileWriter::Sync(bool)+0x177) [0x562099ed947d]",
        "(rocksdb::BuildTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::Env*, rocksdb::FileSystem*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::FileOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase<rocksdb::Slice>*, std::vector<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> >, std::allocator<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> > > >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> >, std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, unsigned long, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint, unsigned long)+0xe9d) [0x56209a035893]",
        "(rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0x613) [0x562099d908d5]",
        "(rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool, bool*)+0x1b1e) [0x562099d8f476]",
        "(rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool, unsigned long*)+0x15f7) [0x562099d8c879]",
        "(rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool)+0x709) [0x562099d91ee9]",
        "(rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x61) [0x562099d911c1]",
        "(RocksDBStore::do_open(std::ostream&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xd81) [0x562099c97fd1]",
        "(BlueStore::_open_db(bool, bool, bool)+0x406) [0x56209970bd96]",
        "(BlueStore::_open_db_and_around(bool, bool)+0x56b) [0x56209977536b]",
        "(BlueStore::_mount()+0x161) [0x562099777cc1]",
        "(OSD::init()+0x4d1) [0x562099242191]",
        "main()",
        "__libc_start_main()",
        "_start()" 
    ],
    "ceph_version": "16.2.4",
    "crash_id": "2021-07-05T17:43:06.512522Z_8b45c17d-4f3d-4159-a87d-f2786b04362d",
    "entity_name": "osd.7",
    "os_id": "ubuntu",
    "os_name": "Ubuntu",
    "os_version": "18.04.2 LTS (Bionic Beaver)",
    "os_version_id": "18.04",
    "process_name": "ceph-osd",
    "stack_sig": "e6ac0271b8378dd5e2af8dab069f26e900882416800d9938a03f3b1e791643f5",
    "timestamp": "2021-07-05T17:43:06.512522Z",
    "utsname_hostname": "sg1vosante",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-60-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#67~18.04.1-Ubuntu SMP Tue Jan 5 22:01:05 UTC 2021" 
}

To restart the OSD, we had to start it back with:
ceph-osd --setuser ceph --setgroup ceph -i 7 -d --bluefs_allocator=bitmap --bluestore_allocator=bitmap

Should I open another bug since it isn't the same stack trace and is still present in the latest LTS?

Actions

Copy link

#22

Updated by Igor Fedotov over 2 years ago

Has duplicate Bug #47243: bluefs _allocate failed then assert added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » bluestore

Custom queries

Bug #47883

bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)

Updated by chunsong feng over 3 years ago

Updated by chunsong feng over 3 years ago

Updated by Igor Fedotov over 3 years ago

Updated by Igor Fedotov over 3 years ago

Updated by Igor Fedotov over 3 years ago

Updated by chunsong feng over 3 years ago

Updated by Neha Ojha over 3 years ago

Updated by Martin Verges over 3 years ago

Updated by Igor Fedotov over 3 years ago

Updated by Igor Fedotov over 3 years ago

Updated by Igor Fedotov over 3 years ago

Updated by Martin Verges over 3 years ago

Updated by Igor Fedotov over 3 years ago

Updated by Kefu Chai over 3 years ago

Updated by Martin Verges over 3 years ago

Updated by Igor Fedotov over 3 years ago

Updated by Martin Verges over 3 years ago

Updated by Igor Fedotov over 3 years ago

Updated by Igor Fedotov over 3 years ago

Updated by Igor Fedotov about 3 years ago

Updated by Jérôme Poulin almost 3 years ago

Updated by Igor Fedotov over 2 years ago