Project

General

Profile

Actions

Bug #50656

closed

bluefs _allocate unable to allocate, though enough free

Added by Jan-Philipp Litza almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific, octopus, nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Yesterday evening, 4 OSDs on SSDs on 2 hosts crashed almost simultaneously and didn't come back up (crashed again right on startup). Later last night, another one followed. The error was identical in every case:

2021-05-04T19:34:41.692+0200 7f9f1b761700  1 bluefs _allocate unable to allocate 0x400000 on bdev 1, allocator name block, allocator type hybrid, capacity 0x3b9e400000, block size 0x1000, free 0x21132a7000, fragmentation 0.0240807, allocated 0x0
2021-05-04T19:34:41.692+0200 7f9f1b761700 -1 bluefs _allocate allocation failed, needed 0x400000
2021-05-04T19:34:41.708+0200 7f9f1b761700 -1 /build/ceph-16.2.1/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, uint64_t, uint64_t)' thread 7f9f1b761700 time 2021-05-04T19:34:41.699837+0200
/build/ceph-16.2.1/src/os/bluestore/BlueFS.cc: 2542: FAILED ceph_assert(r == 0)

 ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x561d9f2ec277]
 2: /usr/bin/ceph-osd(+0xb33461) [0x561d9f2ec461]
 3: (BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, unsigned long, unsigned long)+0x209d) [0x561d9fa5f6cd]
 4: (BlueFS::_fsync(BlueFS::FileWriter*, std::unique_lock<std::mutex>&)+0x9e) [0x561d9fa5f7fe]
 5: (BlueRocksWritableFile::Sync()+0x63) [0x561d9fa837c3]
 6: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x4e) [0x561d9ff6a530]
 7: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x212) [0x561da015e2f4]
 8: (rocksdb::WritableFileWriter::Sync(bool)+0x177) [0x561da015dd05]
 9: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x396) [0x561d9ffc2b26]
 10: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x135a) [0x561d9ffbe988]
 11: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x59) [0x561d9ffbd593]
 12: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x81) [0x561d9ff0e7e1]
 13: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x97) [0x561d9ff0f1c7]
 14: (BlueStore::_kv_sync_thread()+0x2384) [0x561d9f9b83e4]
 15: (BlueStore::KVSyncThread::entry()+0xd) [0x561d9f9e22dd]
 16: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f9f3043b6db]
 17: clone()

2021-05-04T19:34:41.724+0200 7f9f1b761700 -1 *** Caught signal (Aborted) **
 in thread 7f9f1b761700 thread_name:bstore_kv_sync

 ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable)
 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980) [0x7f9f30446980]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x561d9f2ec2d2]
 5: /usr/bin/ceph-osd(+0xb33461) [0x561d9f2ec461]
 6: (BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, unsigned long, unsigned long)+0x209d) [0x561d9fa5f6cd]
 7: (BlueFS::_fsync(BlueFS::FileWriter*, std::unique_lock<std::mutex>&)+0x9e) [0x561d9fa5f7fe]
 8: (BlueRocksWritableFile::Sync()+0x63) [0x561d9fa837c3]
 9: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x4e) [0x561d9ff6a530]
 10: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x212) [0x561da015e2f4]
 11: (rocksdb::WritableFileWriter::Sync(bool)+0x177) [0x561da015dd05]
 12: (rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x396) [0x561d9ffc2b26]
 13: (rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x135a) [0x561d9ffbe988]
 14: (rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x59) [0x561d9ffbd593]
 15: (RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x81) [0x561d9ff0e7e1]
 16: (RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x97) [0x561d9ff0f1c7]
 17: (BlueStore::_kv_sync_thread()+0x2384) [0x561d9f9b83e4]
 18: (BlueStore::KVSyncThread::entry()+0xd) [0x561d9f9e22dd]
 19: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f9f3043b6db]
 20: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Note that free 0x21132a7000 >> needed 0x400000. The fragmentation score of each OSD in question was roughly 0.85

The crashes happened while the cluster was still upgrading from 14.2.20 to 16.2.1, although the OSDs in question finished the upgrade hours earlier, including the reformatting of omap accounting data. I'm not sure when hybrid became the default allocator, so it might also be worth mentioning we updated from 14.2.16 to 14.2.20 earlier the same day.

After 3 hours of research, we were able to start the OSDs again by setting bluestore_allocator = bitmap (it would have been 1 hour if I hadn't confused bluefs_allocator and bluestore_allocator - I wish there were a complete list of Ceph options in the docs!)

Here's the full log. It contains three parts: The original crash, one startup attempt with bluefs debug = 20 (that also shows there are plenty suitable free extents) and finally the working startup with bitmap allocator. Let me know if you need anything else, I guess I can reproduce the crash as soon as I switch back to hybrid...


Files

crash.log (159 KB) crash.log Neha Ojha, 05/14/2021 08:14 PM

Related issues 5 (0 open5 closed)

Related to bluestore - Bug #46804: nautilus: upgrade:nautilus-p2p/nautilus-p2p-stress-split: BlueFS::_flush_and_sync_log ceph_assert(r == 0)Duplicate

Actions
Related to bluestore - Bug #51684: OSD crashes after update to 16.2.4Duplicate

Actions
Copied to bluestore - Backport #51040: pacific: bluefs _allocate unable to allocate, though enough freeResolvedNeha OjhaActions
Copied to bluestore - Backport #51041: octopus: bluefs _allocate unable to allocate, though enough freeResolvedNeha OjhaActions
Copied to bluestore - Backport #51042: nautilus: bluefs _allocate unable to allocate, though enough freeResolvedIgor FedotovActions
Actions

Also available in: Atom PDF