Bug #62815
closedhybrid/avl allocators might be very ineffective when serving bluefs allocations
100%
Description
When operating in best-fit mode these allocators perform chunk lookups through size-sorted "range_size_tree" container. This is done in two stages:
Use container's lower_bound() to search for the first available
long enough chunk.
Iterate sequentially to locate properly aligned chunk,
starting from the position obtained at step 1)
Step 2) might require significant efforts for bluefs allocations (which uses longer allocation unit than bluestore min_alloc_size, e.g. 64K vs. 4K) if space is highly fragmented. Plenty of potentially available chunks might be unaligned to 64K boundary and once aligned they aren't long enough any more.
This spreadsheet shows how dramatic could that be for hybrid allocator (real free map from a production cluster was used to build it):
https://docs.google.com/spreadsheets/d/1bh22JNqwTPSFUQPStYuuiKDEui6YSxZsusIln3tZ9t0/edit?usp=sharing
Updated by Igor Fedotov 8 months ago
Actually the above description is valid for both best- and fast-fit modes. Sequential lookup for 64K aligned chunk might be pretty costly. I've made an experiment where the alignment requirement was changed from 64K to 4K. And allocation times dropped dramatically.
Looks like a good fix to me as large chunk alignment requirement is apparently redundant.
Updated by Igor Fedotov 8 months ago
Originally the issue was revealed a while ago when the following PR was made: https://github.com/ceph/ceph/pull/48640
That implementation is rather an overkill though since we might just get rid off the alignment requirement instead.
Updated by Igor Fedotov 8 months ago
- Severity changed from 3 - minor to 2 - major
Updated by Igor Fedotov 7 months ago
- Status changed from New to Fix Under Review
Updated by Igor Fedotov 7 months ago
- Is duplicate of Bug #62509: os/bluestore: suspect performance bottleneck on allocator added
Updated by Igor Fedotov 7 months ago
- Status changed from Fix Under Review to Duplicate
Updated by Igor Fedotov 5 months ago
- Status changed from Duplicate to Fix Under Review
Updated by Igor Fedotov 5 months ago
Pacific backport is https://github.com/ceph/ceph/pull/54434
Updated by Igor Fedotov 5 months ago
Reef backport is: https://github.com/ceph/ceph/pull/54772
Updated by Igor Fedotov 5 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot 5 months ago
- Copied to Backport #63760: reef: hybrid/avl allocators might be very ineffective when serving bluefs allocations added
Updated by Backport Bot 5 months ago
- Copied to Backport #63761: quincy: hybrid/avl allocators might be very ineffective when serving bluefs allocations added
Updated by Backport Bot 5 months ago
- Copied to Backport #63762: pacific: hybrid/avl allocators might be very ineffective when serving bluefs allocations added
Updated by Yuri Weinstein about 2 months ago
Updated by Igor Fedotov about 2 months ago
- Status changed from Pending Backport to Resolved
Updated by Konstantin Shalygin about 2 months ago
- Assignee set to Igor Fedotov
- % Done changed from 0 to 100
- Source set to Community (dev)