Feature #55879
closedmgr/cephadm: balanced static placement with rendezvous or consistent hashing
0%
Description
What¶
Right now Cephadm performs a naive static placement. For example, all services with count=1 placement strategy will be deployed on the same node, generating a hot-spot. That forces the user to manually define hosts subsets or labels.
However, if a hash based static allocation is performed, this will ensure an almost even static allocation of services. However, if new nodes are added/removed, most services will be reallocated (this is a know issue of hashing).
To reduce move of services on host addition/removal, either rendezvous or consistent hashing should be used instead.
Why¶
The Dashboard is moving to 1-click pre-creation of services for simplifying the user workflows ("Create first, modify later" approach). For that purpose, services should be created without (or with minimal) user interaction. However, given the naive service placement, this would mean that all the services would end up being allocated in the first host, generating a hot spot.
This is not a request to implement (yet) resource-aware placement/scheduling (this is still static placement), but it should definitely result in a more balanced resource allocation.
How to:¶
As a proposal, what about a new mode
parameter in the placement spec? The current static placement could be named static-naive
and the new static-balanced
?
placement: mode: static-balanced count: 3
According to the current algorithm, the balanced placement should only happen when number_of_candidate_hosts > count
. If not, it should basically fall back to a naive placement.
Updated by Redouane Kachach Elhichou almost 2 years ago
- Related to Bug #56415: cephadm uses static placement when creating daemons causing a hotspot on the 'root' node added
Updated by Redouane Kachach Elhichou almost 2 years ago
Cephadm used to distribute the load randomly across the cluster but this was broken by the change: https://github.com/ceph/ceph/commit/adceaa9b28278601c56a7db1c3f42eaa592ec4d1 (introduced as part of the PR https://github.com/ceph/ceph/pull/41007). The tracker https://tracker.ceph.com/issues/56415 tries to restore this functionality back to cephadm.
Updated by Redouane Kachach Elhichou about 1 year ago
- Status changed from New to Resolved