Bug #62282
openBlueFS and BlueStore use the same space (init_rm_free assert)
0%
Description
The problem is triggered on BlueFS mounts and tries to reserve allocations on shared device.
ceph version 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x11e) [0x557821136c6b] 2: /usr/bin/ceph-osd(+0x3dbe27) [0x557821136e27] 3: /usr/bin/ceph-osd(+0xa432d1) [0x55782179e2d1] 4: (AvlAllocator::_try_remove_from_tree(unsigned long, unsigned long, std::function<void (unsigned long, unsigned long, bool)>)+0x24c) [0x5578217960ec] 5: (HybridAllocator::init_rm_free(unsigned long, unsigned long)+0xc0) [0x55782179dfd0] 6: (BlueFS::mount()+0x1f6) [0x5578217666e6] 7: (BlueStore::_open_bluefs(bool, bool)+0x82) [0x557821690b42] 8: (BlueStore::_prepare_db_environment(bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0x5c0) [0x557821691800] 9: (BlueStore::_open_db(bool, bool, bool)+0x179) [0x557821693439] 10: (BlueStore::_open_db_and_around(bool, bool)+0x429) [0x557821694169] 11: (BlueStore::_mount()+0x2ec) [0x55782169a57c] 12: (OSD::init()+0x4fc) [0x55782127359c] 13: main()
The offending range is from BlueFS log: range 0x6dda040000~400000
2023-07-28T18:53:56.313+0000 7fc5b803c2c0 30 bluefs mount noting alloc for file(ino 1 size 0x89c000 mtime 2023-07-27T18:29:37.033690+0000 allocated c20000 alloc_commit 10000 extents [1:0x5d800000~10000,1:0x5d6f0000~10000,1:0x6dda040000~400000,1:0x1a3d0000~400000,1:0x4fd47c0000~400000]) 2023-07-28T18:53:56.313+0000 7fc5b803c2c0 10 HybridAllocator init_rm_free offset 0x5d800000 length 0x10000 2023-07-28T18:53:56.313+0000 7fc5b803c2c0 10 HybridAllocator init_rm_free offset 0x5d6f0000 length 0x10000 2023-07-28T18:53:56.313+0000 7fc5b803c2c0 10 HybridAllocator init_rm_free offset 0x6dda040000 length 0x400000 2023-07-28T18:53:56.313+0000 7fc5b803c2c0 -1 HybridAllocator init_rm_free lambda Uexpected extent: 0x6dda040000~400000 2023-07-28T18:53:56.317+0000 7fc5b803c2c0 -1 /builddir/build/BUILD/ceph-17.2.6/src/os/bluestore/HybridAllocator.cc: In function 'HybridAllocator::init_rm_free(uint64_t, uint64_t)::<lambda(uint64_t, uint64_t, bool)>' thread 7fc5b803c2c0 time 2023-07-28T18:53:56.315192+0000 /builddir/build/BUILD/ceph-17.2.6/src/os/bluestore/HybridAllocator.cc: 175: FAILED ceph_assert(false)
As a run of fsck shows:
ceph-bluestore-tool --path /rootfs/var/lib/ceph/08416350-2b97-11ee-a9a2-ac1f6b40d3fc/osd.46/ --bluestore-allocator=bitmap fsck 2023-08-01T15:33:44.344+0000 7fef99bc4600 -1 bluestore(/rootfs/var/lib/ceph/08416350-2b97-11ee-a9a2-ac1f6b40d3fc/osd.46) operator()::fsck error: oid #-1:3105a3cf:::disk_bw_test_41:0#, extent 0x6dda2c0000~10000 or a subset is already allocated (misreferenced) 2023-08-01T15:33:44.344+0000 7fef99bc4600 -1 bluestore(/rootfs/var/lib/ceph/08416350-2b97-11ee-a9a2-ac1f6b40d3fc/osd.46) operator()::fsck error: oid #-1:3105a3cf:::disk_bw_test_41:0#, extent 0x6dda280000~10000 or a subset is already allocated (misreferenced) 2023-08-01T15:33:44.344+0000 7fef99bc4600 -1 bluestore(/rootfs/var/lib/ceph/08416350-2b97-11ee-a9a2-ac1f6b40d3fc/osd.46) operator()::fsck error: oid #-1:3105a3cf:::disk_bw_test_41:0#, extent 0x6dda290000~10000 or a subset is already allocated (misreferenced) 2023-08-01T15:33:44.344+0000 7fef99bc4600 -1 bluestore(/rootfs/var/lib/ceph/08416350-2b97-11ee-a9a2-ac1f6b40d3fc/osd.46) operator()::fsck error: oid #-1:3105a3cf:::disk_bw_test_41:0#, extent 0x6dda2a0000~10000 or a subset is already allocated (misreferenced) 2023-08-01T15:33:44.344+0000 7fef99bc4600 -1 bluestore(/rootfs/var/lib/ceph/08416350-2b97-11ee-a9a2-ac1f6b40d3fc/osd.46) operator()::fsck error: oid #-1:3105a3cf:::disk_bw_test_41:0#, extent 0x6dda2b0000~10000 or a subset is already allocated (misreferenced) 2023-08-01T15:33:44.344+0000 7fef99bc4600 -1 bluestore(/rootfs/var/lib/ceph/08416350-2b97-11ee-a9a2-ac1f6b40d3fc/osd.46) operator()::fsck error: oid #-1:3105a3cf:::disk_bw_test_41:0#, extent 0x6dda2d0000~10000 or a subset is already allocated (misreferenced) ..... 2023-08-01T15:33:44.361+0000 7fef99bc4600 -1 bluestore(/rootfs/var/lib/ceph/08416350-2b97-11ee-a9a2-ac1f6b40d3fc/osd.46) operator()::fsck error: oid #-1:3fc5a3cf:::disk_bw_test_40:0#, extent 0x4fd4890000~10000 or a subset is already allocated (misreferenced) fsck status: remaining 128 error(s) and warning(s)
The space on disk is currently occcupied both by BlueFS and BlueStore.
The device is rotational and bitmap_freelist_manager is in use.
Files
Updated by Adam Kupczyk 9 months ago
- Subject changed from BlueFS and BlueStore use the same space to BlueFS and BlueStore use the same space (init_rm_free assert)
Updated by Adam Kupczyk 9 months ago
The avl allocator (and by extension hybrid) has an ability to accept same region twice.
unique_ptr<Allocator> alloc;
alloc.reset(Allocator::create(g_ceph_context, "hybrid",
0x746e1c00000, 4096, 0, 0, "dupa"));
alloc->init_add_free(0x100000,0x10000);
alloc->init_add_free(0x120000,0x10000);
PExtentVector release_set;
release_set.emplace_back(0x108000, 0x10000);
alloc->release(release_set);
int r = 0;
int allocated = 0;
do {
PExtentVector tmp;
r = alloc->allocate(0x4000, 0x1000, 0, 0, &tmp);
if (r > 0) allocated += r;
std::cout << "allocated=" << tmp << std::hex << " total 0x" << allocated << std::dec << std::endl;
} while (r > 0);
allocated=[0x100000~4000] total 0x4000 allocated=[0x104000~4000] total 0x8000 allocated=[0x108000~4000] total 0xc000 allocated=[0x10c000~4000] total 0x10000 allocated=[0x108000~4000] total 0x14000 allocated=[0x10c000~4000] total 0x18000 allocated=[0x110000~4000] total 0x1c000 allocated=[0x114000~4000] total 0x20000 allocated=[0x120000~4000] total 0x24000 allocated=[0x124000~4000] total 0x28000 allocated=[0x128000~4000] total 0x2c000 allocated=[0x12c000~4000] total 0x30000 allocated=[] total 0x30000
Now it has given out region 0x108000~8000 TWICE!
This convinces me that the bug is caused by relasing some region that was already free.
Then BlueFS and BlueStore independently got the same region.
Action plan is to add meaningful catch codes for these events and start testing.
Possibly even make it a part of the releases, so we could use telemetry to zoom in on whatever is going on.
Updated by Igor Fedotov 9 months ago
Adam Kupczyk wrote:
The avl allocator (and by extension hybrid) has an ability to accept same region twice.
[...]
[...]
Now it has given out region 0x108000~8000 TWICE!
This convinces me that the bug is caused by relasing some region that was already free.
Then BlueFS and BlueStore independently got the same region.Action plan is to add meaningful catch codes for these events and start testing.
Possibly even make it a part of the releases, so we could use telemetry to zoom in on whatever is going on.
Would https://github.com/ceph/ceph/pull/47730 do the trick?
Updated by Adam Kupczyk 8 months ago
Related to https://tracker.ceph.com/issues/62341
Updated by Igor Fedotov 5 months ago
I think this could be related to https://tracker.ceph.com/issues/63618 if DB shares main device and legacy 64K alloc unit is in use for this device.
Updated by Igor Fedotov 5 months ago
- Related to Bug #63618: Allocator configured with 64K alloc unit might get 4K requests added