Feature #51004: cephadm agent 2.0 - Orchestrator - Ceph

Actions

Copy link

Feature #51004

closed

cephadm agent 2.0

Added by Sebastian Wagner almost 3 years ago. Updated over 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Adam King

Category:

Target version:

% Done:

100%

Source:

Tags:

Backport:

Reviewed:

Affected Versions:

Pull request ID:

42384

Description

The idea is to re-build the cephadm agent and make it push data to the mgr

Things to push to mgr/cephadm

ls - running containers
list-networks
gather-facts

Architecture

systemd unit
non-containerized
cephadm agent to push to mgr/cephadm
cherrypy
client authentication
HTTPS certificates
make it mandatory? yes!
just members for now. No client hosts.
have to verify that we have the same hash of the binary. Let's avoid any compatibility requirements. Only if this actually helps reducing the complexity

lots of new failure modes

We properly need to handle all of those in a good way:

what to do when agent never pushed?
load spikes!
mgr moves around. needs latest MGR endpoint
problem: flopping daemons
how do we detect offline hosts? -> timeout
problem: firewall blocking http connections to the MGR
problem: thrashing. If the MGR is overloaded for 5 mins and the agent can no longer push information to the mgr.

race conditions

agent caches running daemons
mgr/cephadm deployes a new mgr
agent pushes outdated information to the mgr
mgr/cephadm deployes a seconds new mgr

solution: lamport clock: https://en.wikipedia.org/wiki/Lamport_timestamp

open questions

should we access the config-key store directly from the MGR endpoint?

if not: race conditions?
if yes: slow REST API endpoint?

Action items

add mgr/cephadm endpoint
add cephadm command to push results to endpoint
add cephadm command that push results every so often
replace exporter with cephadm agent daemon mode
teach mgr/cephadm to deploy agent (incl generating a keyring for each host/agent)
add lamport clock
simplify mgr/cephadm serve() loop

Files

agent 2.0 (43.9 KB) agent 2.0

Sebastian Wagner, 05/27/2021 02:23 PM

Subtasks 7 (0 open — 7 closed)

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Paul Cuzner almost 3 years ago

If the intent is to move to a agent architecture, why not offload the work entirely to the agent i.e. don't just have a passive endpoint, enable it to provide osd creation, daemon deployments etc etc Wouldn't this help scale too?

This is what I would expect from an agent. Otherwise, isn't agnt v2 just another exporter that provides state?

Actions

Copy link