Project

General

Profile

Actions

Bug #47883

closed

bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)

Added by chunsong feng over 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
BlueStore
Backport:
octopus, nautilus
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-10-17T18:02:17.658+0800 ffff7eaac980 1 bluefs _allocate failed to allocate 0x4d0000 on bdev 1, free 0x19c2108000; fallback to bdev 2
2020-10-17T18:02:17.668+0800 ffff7eaac980 1 bluefs _allocate unable to allocate 0x4d0000 on bdev 2, free 0xffffffffffffffff
2020-10-17T18:02:17.678+0800 ffff7eaac980 -1 /home/rpmbuild/BUILD/ceph-16.0.0-4421-gcfeea548ad/src/os/bluestore/BlueFS.cc: In function 'void BlueFS::_compact_log_async(std::unique_lock<std::mutex>&)' thread ffff7eaac980 time 2020-10-17T18:02:17.676262+0800
/home/rpmbuild/BUILD/ceph-16.0.0-4421-gcfeea548ad/src/os/bluestore/BlueFS.cc: 2198: FAILED ceph_assert(r == 0)

ceph version 16.0.0-4421-gcfeea548ad (971ac7aa5739e2dbc0df1b7c8b10b17653df1037) pacific (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x154) [0xaaaae34238e8]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xaaaae3423ad0]
3: (BlueFS::_compact_log_async(std::unique_lock&lt;std::mutex&gt;&)+0x16d0) [0xaaaae3a42518]
4: (BlueRocksWritableFile::Sync()+0x80) [0xaaaae3a538f0]
5: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x2c) [0xaaaae3e8e734]
6: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x3a0) [0xaaaae3f7c498]
7: (rocksdb::WritableFileWriter::Sync(bool)+0x90) [0xaaaae3f7d8b8]
8: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x274) [0xaaaae3e9f444]
9: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x1ccc) [0xaaaae3ea611c]
10: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x30) [0xaaaae3ea63c0]
11: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr&lt;KeyValueDB::TransactionImpl&gt;)+0x94) [0xaaaae3e5081c]
12: (RocksDBStore::submit_transaction_sync(std::shared_ptr&lt;KeyValueDB::TransactionImpl&gt;)+0x98) [0xaaaae3e510c8]
13: (BlueStore::_kv_sync_thread()+0x22dc) [0xaaaae39a1b04]
14: (BlueStore::KVSyncThread::entry()+0x10) [0xaaaae39c69e0]
15: /lib64/libpthread.so.0(+0x77f8) [0xffffa32e77f8]
16: /lib64/libc.so.6(+0xd0edc) [0xffffa2e40edc]

2020-10-17T18:02:17.678+0800 ffff7eaac980 -1 ** Caught signal (Aborted) *
in thread ffff7eaac980 thread_name:bstore_kv_sync

ceph version 16.0.0-4421-gcfeea548ad (971ac7aa5739e2dbc0df1b7c8b10b17653df1037) pacific (dev)
1: _kernel_rt_sigreturn()
2: gsignal()
3: abort()
4: (ceph::
_ceph_assert_fail(char const*, char const*, int, char const*)+0x1a4) [0xaaaae3423938]
5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xaaaae3423ad0]
6: (BlueFS::_compact_log_async(std::unique_lock&lt;std::mutex&gt;&)+0x16d0) [0xaaaae3a42518]
7: (BlueRocksWritableFile::Sync()+0x80) [0xaaaae3a538f0]
8: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x2c) [0xaaaae3e8e734]
9: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x3a0) [0xaaaae3f7c498]
10: (rocksdb::WritableFileWriter::Sync(bool)+0x90) [0xaaaae3f7d8b8]

Files

ceph-osd.34.rar (345 KB) ceph-osd.34.rar chunsong feng, 10/19/2020 01:38 AM

Related issues 3 (0 open3 closed)

Has duplicate bluestore - Bug #47243: bluefs _allocate failed then assertDuplicate

Actions
Copied to bluestore - Backport #48478: octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)ResolvedActions
Copied to bluestore - Backport #48479: nautilus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0)ResolvedActions
Actions #1

Updated by chunsong feng over 3 years ago

The problem recurs within 10 minutes after a test is performed on more than 32 KB concurrent random writes.

Actions #2

Updated by chunsong feng over 3 years ago

Test Allocator=stipid and hybrid respectively.
When creating an OSD, only data is used. Block-wal and block-db are not configured. The total data space is 200 GB. When the application fails, there is still about 100 GB free space. The application also fails to be applied for after the OSD is restarted. Is this a problem caused by fragmentation allocation?

Actions #3

Updated by Igor Fedotov over 3 years ago

chunsong feng wrote:
Hi Feng,

you're using master Ceph branch, right?
Am I getting properly OSD is unable to restart after this failure with the same "failed to allocate" log output? If so could you please set
"debug bluestore" to 10 and collect that failing startup log.

And one more experiment if possible please:
1) Start your benchmark against clean OSDs
2) Monitor fragmentation for a couple of OSDs using "watch -n 5 ceph tell osd.N bluestore allocator fragmentation block"

Test Allocator=stipid and hybrid respectively.

Do you mean you tried both stupid and hybrid with the same results?

When creating an OSD, only data is used. Block-wal and block-db are not configured. The total data space is 200 GB. When the application fails, there is still about 100 GB free space. The application also fails to be applied for after the OSD is restarted. Is this a problem caused by fragmentation allocation?

Actions #4

Updated by Igor Fedotov over 3 years ago

  • Project changed from Ceph to bluestore
  • Category deleted (Objecter)
Actions #5

Updated by Igor Fedotov over 3 years ago

Also prior to redeploying the cluster/OSDs you might want to learn current fragmentation rating using:
ceph-bluestore-tool --path <path-to-osd> --allocator block free-score
ceph-bluestore-tool --path <path-to-osd> --allocator block free-dump

Actions #6

Updated by chunsong feng over 3 years ago

Igor Fedotov wrote:

chunsong feng wrote:
Hi Feng,

you're using master Ceph branch, right?

right

Am I getting properly OSD is unable to restart after this failure with the same "failed to allocate" log output? If so could you please set
"debug bluestore" to 10 and collect that failing startup log.

OSD is unable to restart after this failure with the same "failed to allocate" log output,right

And one more experiment if possible please:
1) Start your benchmark against clean OSDs
2) Monitor fragmentation for a couple of OSDs using "watch -n 5 ceph tell osd.N bluestore allocator fragmentation block"

Test Allocator=stipid and hybrid respectively.

Do you mean you tried both stupid and hybrid with the same results?

stupid and hybrid is the same results. This issue does not occur in versions 15.2.5 and 14.2.11.

When creating an OSD, only data is used. Block-wal and block-db are not configured. The total data space is 200 GB. When the application fails, there is still about 100 GB free space. The application also fails to be applied for after the OSD is restarted. Is this a problem caused by fragmentation allocation?

Actions #7

Updated by Neha Ojha over 3 years ago

  • Status changed from New to Need More Info
Actions #8

Updated by Martin Verges over 3 years ago

We triggered the bug on a production cluster having a lot of small files and objects as their workload.
The affected host where updated from 14.2.9 to 14.2.12 and 14.2.15 both newer versions hit this bug but just on 1 out of 13 OSDs (1 out of 3 NVMe OSDs).

Solution or Workaround to solve the issue was a OSD destroy and recreation.

Please see Output of "free-score", "free-dump", as well as the crash log.
Unfortunately the output is around >700MB and even zipped >40Mb so I can't upload at the tracker. The logs are available at https://nextcloud.croit.io/s/WZt9QPrEJTkEZMk

Actions #9

Updated by Igor Fedotov over 3 years ago

  • Status changed from Need More Info to Fix Under Review
  • Assignee set to Igor Fedotov
  • Backport set to octopus, nautilus
  • Pull request ID set to 38438
Actions #10

Updated by Igor Fedotov over 3 years ago

The issue is caused by a bug in avl/hybrid allocators. The workaround would be switching back to bitmap allocator.

Actions #11

Updated by Igor Fedotov over 3 years ago

Well, my previous comment is valid for Martin's issue. It's still unclear for me whether it applies to the original report due to the lack of info.

Actions #12

Updated by Martin Verges over 3 years ago

just as an additional Information, the allocator is already bitmap.

Actions #13

Updated by Igor Fedotov over 3 years ago

Martin Verges wrote:

just as an additional Information, the allocator is already bitmap.

Do you mean it WAS bitmap when the issue occurred or it has been switched to bitmap since then?

Actions #14

Updated by Kefu Chai over 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #15

Updated by Martin Verges over 3 years ago

Igor Fedotov wrote:

Do you mean it WAS bitmap when the issue occurred or it has been switched to bitmap since then?

Yes it was and we/customer never changed it to something else.

Actions #16

Updated by Igor Fedotov over 3 years ago

The default allocator was changed from bitmap to hybrid starting v14.2.11.
Hence unless you had made any custom settings the allocator type was changed for you automatically during upgrade to v14.2.11+.

Does it makes sense or I'm missing something?

Actions #17

Updated by Martin Verges over 3 years ago

ok sorry, didn't know that. Yes then it has changed while installing the .12 release.

Actions #18

Updated by Igor Fedotov over 3 years ago

  • Copied to Backport #48478: octopus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0) added
Actions #19

Updated by Igor Fedotov over 3 years ago

  • Copied to Backport #48479: nautilus: bluefs _allocate failed to allocate bdev 1 and 2,cause ceph_assert(r == 0) added
Actions #20

Updated by Igor Fedotov about 3 years ago

  • Status changed from Pending Backport to Resolved
Actions #21

Updated by Jérôme Poulin almost 3 years ago

A similar problem can happen when upgrading from 14.2.2 to 16.2.4

An OSD will start then fail about an hour later with this stack trace.

{
    "archived": "2021-07-05 19:04:48.552795",
    "assert_condition": "r == 0",
    "assert_file": "/build/ceph-16.2.4/src/os/bluestore/BlueFS.cc",
    "assert_func": "int BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, uint64_t, uint64_t)",
    "assert_line": 2542,
    "assert_msg": "/build/ceph-16.2.4/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, uint64_t, uint64_t)' thread 7f6d0133df00 time 2021-07-05T13:43:06.442236-0400\n/build/ceph-16.2.4/src/os/bluestore/BlueFS.cc: 2542: FAILED ceph_assert(r == 0)\n",
    "assert_thread_name": "ceph-osd",
    "backtrace": [
        "/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7f6cff4a3890]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19c) [0x56209918eb6e]",
        "(ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x56209918ecf8]",
        "(BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, unsigned long, unsigned long)+0x1bac) [0x56209980d19c]",
        "(BlueFS::_fsync(BlueFS::FileWriter*, std::unique_lock<std::mutex>&)+0x9e) [0x56209980d89e]",
        "(BlueRocksWritableFile::Sync()+0x6c) [0x56209983826c]",
        "(rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x4e) [0x562099cefcd8]",
        "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x212) [0x562099ed9a6c]",
        "(rocksdb::WritableFileWriter::Sync(bool)+0x177) [0x562099ed947d]",
        "(rocksdb::BuildTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::Env*, rocksdb::FileSystem*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::FileOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase<rocksdb::Slice>*, std::vector<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> >, std::allocator<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> > > >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> >, std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, unsigned long, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint, unsigned long)+0xe9d) [0x56209a035893]",
        "(rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0x613) [0x562099d908d5]",
        "(rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool, bool*)+0x1b1e) [0x562099d8f476]",
        "(rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool, unsigned long*)+0x15f7) [0x562099d8c879]",
        "(rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool)+0x709) [0x562099d91ee9]",
        "(rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x61) [0x562099d911c1]",
        "(RocksDBStore::do_open(std::ostream&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xd81) [0x562099c97fd1]",
        "(BlueStore::_open_db(bool, bool, bool)+0x406) [0x56209970bd96]",
        "(BlueStore::_open_db_and_around(bool, bool)+0x56b) [0x56209977536b]",
        "(BlueStore::_mount()+0x161) [0x562099777cc1]",
        "(OSD::init()+0x4d1) [0x562099242191]",
        "main()",
        "__libc_start_main()",
        "_start()" 
    ],
    "ceph_version": "16.2.4",
    "crash_id": "2021-07-05T17:43:06.512522Z_8b45c17d-4f3d-4159-a87d-f2786b04362d",
    "entity_name": "osd.7",
    "os_id": "ubuntu",
    "os_name": "Ubuntu",
    "os_version": "18.04.2 LTS (Bionic Beaver)",
    "os_version_id": "18.04",
    "process_name": "ceph-osd",
    "stack_sig": "e6ac0271b8378dd5e2af8dab069f26e900882416800d9938a03f3b1e791643f5",
    "timestamp": "2021-07-05T17:43:06.512522Z",
    "utsname_hostname": "sg1vosante",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-60-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#67~18.04.1-Ubuntu SMP Tue Jan 5 22:01:05 UTC 2021" 
}

To restart the OSD, we had to start it back with:
ceph-osd --setuser ceph --setgroup ceph -i 7 -d --bluefs_allocator=bitmap --bluestore_allocator=bitmap

Should I open another bug since it isn't the same stack trace and is still present in the latest LTS?

Actions #22

Updated by Igor Fedotov over 2 years ago

  • Has duplicate Bug #47243: bluefs _allocate failed then assert added
Actions

Also available in: Atom PDF