Project

General

Profile

Feature #53919

Updated by Ernesto Puerta over 2 years ago

AKA: "moving a daemon to another host to help balance resources across the cluster" 

 This task as described by PaulC says: "moving a daemon to another host to help balance resources across the cluster". 

 However, IMHO this kind of manual allocation/pinning should be avoided. Ideally, the cephadm scheduler should be smart enough to allocate services according to resources: 
 * Either statically at service creation time 
 * Or dynamically if the resource utilization changes over time. 

 Since, none of the above described features are implemented yet in the dashboard, we should define a way to manually allocate daemons (service instances) to less loaded nodes. 

 In the absence of dynamic service scheduling, we would expect the following user workflow: 
 1. In case a host reaches some (non instant, but 1 min-average) threshold in resource utilization (85% RSS memory, CPU or network), an alert should be triggered. 
 2. Optionally, in the hosts panel we could highlight the affected hosts (however, so far we're not printing the real-time stats but the gather facts ones, which are static). 
 3. The user could reallocate some (ideally the most resource consuming) service to other nodes, by using direct placement/pinning (e.g.: host7, host8, host9) or labels.  

 The more I read the above, the more convinced I am that this is a wrong approach: 

 * It's not trivial to implement (since the dashboard will still need to go through the orchestrator for gathering the real-time resource stats), 
 * To provide a decent user-experience (e.g.: hinting the user which host is affected, which service they should move or to which host), it almost requires the same amount of complex logic than doing a proper cephadm dynamic scheduler. 

 In any case, if we are ok with just dumb service allocation, we don't really need to implement much code: just editing a service and changing the service placement spec is enough.

Back