Project

General

Profile

Actions

Bug #53678

closed

NCB's reconstruct allocations improperly handles shared blobb

Added by Igor Fedotov over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

One might get the following assertion on fsck/startup when it keeps tons of shared blobs and previously crashed due to no space error.

../src/os/bluestore/fastbmap_allocator_impl.h: 809: FAILED ceph_assert(available >= allocated)

ceph version Development (no_version) quincy (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14c) [0x7f484ea021fb]
2: /home/if/ceph.2/build/lib/libceph-common.so.2(+0x25c40c) [0x7f484ea0240c]
3: (BitmapAllocator::init_rm_free(unsigned long, unsigned long)+0x709) [0x55c0f2ae1f39]
4: (BlueStore::read_allocation_from_single_onode(Allocator*, boost::intrusive_ptr<BlueStore::Onode>&, BlueStore::read_alloc_stats_t&)+0x103) [0x55c0f28f5d33]
5: (BlueStore::read_allocation_from_onodes(Allocator*, BlueStore::read_alloc_stats_t&)+0xd31) [0x55c0f29309a1]
6: (BlueStore::reconstruct_allocations(Allocator*, BlueStore::read_alloc_stats_t&)+0x520) [0x55c0f2931a00]
7: (BlueStore::read_allocation_from_drive_on_startup()+0x96) [0x55c0f29485c6]
8: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*)+0xac3) [0x55c0f29496b3]
9: (BlueStore::_open_db_and_around(bool, bool)+0x367) [0x55c0f2992727]
10: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x486) [0x55c0f29950f6]
11: main()

Adam's script from https://gist.github.com/aclamk/974db54b7613e074c6bbc4877a4d722e is a good reproducer. But please note OSD has to end up with ENOSPC error before getting this failure.

IMO the root cause is a design flaw in NCB code which feeds the same extent from a shared blob multiple times to NCB restore allocator - bitmap allocator silently accepts duplicate extents but improperly adjust available space multiple times which finally causes the assertion.
Generally allocators are not supposed to get the same extent in init_rm_free calls multiple times - it's an abuse of bitmap one which performs no checking for performance reasons and hence silently accepts that up to some point. E.g. AVL one would assert immediately..
I think the proper shared blob handling should be similar to fsck implementation and should involve standalone run over shared blob KV records to update allocator instead of handling shared blob on per-onode basis.


Related issues 1 (0 open1 closed)

Copied to bluestore - Backport #54175: quincy: NCB's reconstruct allocations improperly handles shared blobbResolvedGabriel BenHanokhActions
Actions

Also available in: Atom PDF