Bug #63618
closedAllocator configured with 64K alloc unit might get 4K requests
100%
Description
Legacy (pre-pacific deployed) OSDs setup their main device allocator with 64K allocation unit - as configured in OSD's label.
After introducing 4K alloc unit support to BlueFS and if DB co-locates main device or BlueFS spills over this allocator might get allocate/release 4K-aligned requests. This is more or less fine for AVL/BTree/Stupid allocator but inappropriate for bitmap one (including its incarnation inside hybrid one).
As a result assertions or free space leaks might occur.
Files
Updated by Igor Fedotov 5 months ago
-1> 2023-11-23T23:03:27.872+0300 7f6b9aa4f0c0 -1 /home/if/ceph.3/src/os/bluestore/fastbmap_allocator_impl.h: In function 'void AllocatorLevel02<T>::_mark_allocated(uint64_t, uint64_t) [with L1 = AllocatorLevel01Loose; uint64_t = long unsigned int]' thread 7f6b9aa4f0c0 time 2023-11-23T23:03:27.876018+0300
/home/if/ceph.3/src/os/bluestore/fastbmap_allocator_impl.h: 829: FAILED ceph_assert(available >= allocated)
ceph version Development (no_version) reef (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f6b9baafc2b]
2: /home/if/ceph.3/build/lib/libceph-common.so.2(+0x2afe38) [0x7f6b9baafe38]
3: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x14ec) [0x5603da1b88ec]
4: (AllocTest_test_alloc_bad_unit_Test::TestBody()+0xd1) [0x5603da152351]
5: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x4d) [0x5603da19bb2d]
6: (testing::Test::Run()+0xce) [0x5603da18ef7e]
7: (testing::TestInfo::Run()+0x135) [0x5603da18f0d5]
8: (testing::TestSuite::Run()+0xc9) [0x5603da18f1c9]
9: (testing::internal::UnitTestImpl::RunAllTests()+0x49c) [0x5603da18f74c]
10: (testing::UnitTest::Run()+0x82) [0x5603da18fa02]
11: main()
12: /lib64/libc.so.6(+0x281b0) [0x7f6b9ac281b0]
13: __libc_start_main()
14: _start()
The assertion reproduced by the unit test from the attached patch. Similar one has been seen in the field.
Updated by Igor Fedotov 5 months ago
- Related to Bug #62282: BlueFS and BlueStore use the same space (init_rm_free assert) added
Updated by Adam Kupczyk 5 months ago
PR https://github.com/ceph/ceph/pull/48854 "os/bluestore: enable 4K allocation unit for BlueFS"
was created with case that main block device was already 4K AU.
I do not understand how BlueFS is requesting allocation from main not getting 64K aligned, retrying in 4K and getting it.
It is strange when AU is 64K.
Possible change should be to prevent requesting smaller size then AU.
However, this problem should be diligently addressed in each version we backported PR 48854 to.
Updated by Neha Ojha 5 months ago
Notes from today's discussion: https://pad.ceph.com/p/RCA_62282
Updated by Igor Fedotov 5 months ago
- Status changed from New to Fix Under Review
- Pull request ID set to 53483
So the issue occurs when custom bluefs_shared_alloc_size is in use and it's below min_alloc_size persistent for BlueStore on deployment.
Hence regular users should be unaffected.
Updated by Igor Fedotov 5 months ago
Whether #62282 is related or not is still an open question.
Updated by Igor Fedotov 5 months ago
Reef backport is: https://github.com/ceph/ceph/pull/54772
Updated by Igor Fedotov 5 months ago
Pacific backport: https://github.com/ceph/ceph/pull/54434
Updated by Igor Fedotov 5 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot 5 months ago
- Copied to Backport #63757: reef: Allocator configured with 64K alloc unit might get 4K requests added
Updated by Backport Bot 5 months ago
- Copied to Backport #63758: quincy: Allocator configured with 64K alloc unit might get 4K requests added
Updated by Backport Bot 5 months ago
- Copied to Backport #63759: pacific: Allocator configured with 64K alloc unit might get 4K requests added
Updated by Yuri Weinstein about 2 months ago
Updated by Igor Fedotov about 2 months ago
- Status changed from Pending Backport to Resolved
Updated by Konstantin Shalygin about 2 months ago
- Assignee set to Igor Fedotov
- % Done changed from 0 to 100
- Source set to Community (dev)