Feature #45410: cephadm: Support upgrading alertmanager, grafana, prometheus and node_exporter - Orchestrator - Ceph

Actions

Copy link

Feature #45410

closed

cephadm: Support upgrading alertmanager, grafana, prometheus and node_exporter

Added by Sebastian Wagner almost 4 years ago. Updated almost 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

cephadm/monitoring

Target version:

% Done:

100%

Source:

Tags:

Backport:

Reviewed:

Affected Versions:

Pull request ID:

Description

Right now, we're simply downloading :latest, which might even differ between daemons on different hosts.

Subtasks 5 (0 open — 5 closed)

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Sebastian Wagner almost 4 years ago

Category changed from cephadm to cephadm/monitoring

Actions

Copy link

Updated by Juan Miguel Olmo Martínez almost 4 years ago

It would be nice to have this two things:
1. Use by default fixed versions images of the different components of the monitoring stack
2. Provide an easy way to use other images.

Actions

Copy link

Updated by Patrick Seidensal almost 4 years ago

This might not be an issue for minor version upgrades in Grafana and Prometheus, although it would be hard to guarantee that (if minor versions can be upgraded without us being able to verify that they work as expected). But I think it is necessary to be able to upgrade minor versions for security reasons. I'm not even sure if we can use some kind of tag or label to achieve that. We should prevent upgrades to new and (on our side) untested major versions. If there's no mechanism to achieve that, we might need to stick to fixed versions. But then it is our responsibility to upgrade promptly if security issues have been fixed and those versions have been released.

I'm a little bit concerned about the Node exporter, though. Minor version upgrades have broken metric names in the past. On the other hand, there's a pre-release of v1.0.0 available which might indicate that things as metric names might stay stable within minor version upgrades in the future. Of course this won't be an issue if we decide to use fixed versions and test all (including minor version) upgrades beforehand.

Actions

Copy link

Updated by Sebastian Wagner almost 4 years ago

Related to Documentation #45411: cephadm: add section about container images added

Actions

Copy link

Updated by Alfonso Martínez almost 4 years ago

These are the monitoring stack versions that we use in our nautilus-based releases:
grafana: 5.4.3
prometheus: v2.7.2
alertmanager: 0.16.2
node_exporter: 0.17.0

grafana plugins:
https://docs.ceph.com/docs/master/mgr/dashboard/#enabling-the-embedding-of-grafana-dashboards
Not sure, but we can assume that today latest versions of mentioned plugins here are a good start:
grafana-piechart-panel: 1.4.0
vonage-status-panel: 1.0.9

Actions

Copy link

Updated by Patrick Seidensal almost 4 years ago

This are our current versions

Grafana 5.3.3
Alertmanager 0.16.2
Prometheus 2.11.1
Node exporter 0.17.0

grafana-piechart-panel 1.3.6
grafana-status-panel 1.0.9

Actions

Copy link

Updated by Patrick Seidensal almost 4 years ago

It currently seems that using fixed versions for monitoring stack containers are the only way to be ensure that major versions of those applications aren't automatically updated. Being responsible to publish updates, that won't break functionality, includes a responsibility to upgrade those applications when security issues arise.

To be at least notified about upcoming security vulnerabilities, Clair can be used to have those images checked, even automatically.

Clair is a tool that checks for security vulnerabilities in container images. It uses a a PostgresQL database, which is by default populated with CVEs from various different sources like the Debian Security Bug Tracker, Ubuntu CVE Tracker, Red Hat Security Data, SUSE OVAL Descriptions and others.

It is used in products like quay.io from Red Hat.

Clair can be integrated into Container registries for automatic security checks, like the aforementioned product.

Through tools like klar, clair can be used without an integration to registries and is capable of checking local and remote container images for security vulnerabilities.

This is how it such a check might look like:

user@home ~ » CLAIR_ADDR=localhost klar grafana/grafana:latest                                                                                                        

clair timeout 1m0s
docker timeout: 1m0s
no whitelist file
Analysing 8 layers
Got results from Clair API v1
Found 0 vulnerabilities

Actions

Copy link